Schistosomiasis is a water-based disease that is believed to affect over 200 million people with an estimated 97% of the infections concentrated in Africa. However, these statistics are largely based on population re-adjusted data originally published by Utroska and colleagues more than 20 years ago. Hence, these estimates are outdated due to large-scale preventive chemotherapy programs, improved sanitation, water resources development and management, among other reasons. For planning, coordination, and evaluation of control activities, it is essential to possess reliable schistosomiasis prevalence maps.
We analyzed survey data compiled on a newly established open-access global neglected tropical diseases database (i) to create smooth empirical prevalence maps for Schistosoma mansoni and S. haematobium for individuals aged ≤20 years in West Africa, including Cameroon, and (ii) to derive country-specific prevalence estimates. We used Bayesian geostatistical models based on environmental predictors to take into account potential clustering due to common spatially structured exposures. Prediction at unobserved locations was facilitated by joint kriging.
Our models revealed that 50.8 million individuals aged ≤20 years in West Africa are infected with either S. mansoni, or S. haematobium, or both species concurrently. The country prevalence estimates ranged between 0.5% (The Gambia) and 37.1% (Liberia) for S. mansoni, and between 17.6% (The Gambia) and 51.6% (Sierra Leone) for S. haematobium. We observed that the combined prevalence for both schistosome species is two-fold lower in Gambia than previously reported, while we found an almost two-fold higher estimate for Liberia (58.3%) than reported before (30.0%). Our predictions are likely to overestimate overall country prevalence, since modeling was based on children and adolescents up to the age of 20 years who are at highest risk of infection.
We present the first empirical estimates for S. mansoni and S. haematobium prevalence at high spatial resolution throughout West Africa. Our prediction maps allow prioritizing of interventions in a spatially explicit manner, and will be useful for monitoring and evaluation of schistosomiasis control programs.
Schistosomiasis is a parasitic disease caused by a blood fluke that mainly occurs in Africa. Current prevalence estimates of schistosomiasis are based on historical data, and hence might be outdated due to control programs, improved sanitation, and water resources development and management (e.g., construction of large dams and irrigation systems). To help planning, coordination, and evaluation of control activities, reliable schistosomiasis prevalence estimates are needed. We analyzed compiled survey data from 1980 onwards for West Africa, including Cameroon, focusing on individuals aged ≤20 years. Bayesian geostatistical models were implemented based on environmental and climatic predictors to take into account potential spatial clustering within the data. We created the first smooth data-driven prevalence maps for Schistosoma mansoni and S. haematobium at high spatial resolution throughout West Africa. We found that an estimated 50.8 million West Africans aged ≤20 years are infected with schistosome blood flukes. Country prevalence estimates ranged between 0.5% (in The Gambia) and 37.1% (in Liberia) for S. mansoni and between 17.6% (in The Gambia) and 51.6% (in Sierra Leone) for S. haematobium. Our results allow prioritization of areas where interventions are needed, and to monitor and evaluate the impact of control activities.
Citation: Schur N, Hürlimann E, Garba A, Traoré MS, Ndir O, et al. (2011) Geostatistical Model-Based Estimates of Schistosomiasis Prevalence among Individuals Aged ≤20 Years in West Africa. PLoS Negl Trop Dis 5(6): e1194. doi:10.1371/journal.pntd.0001194
Editor: Simon Brooker, London School of Hygiene & Tropical Medicine, United Kingdom
Received: November 1, 2010; Accepted: April 22, 2011; Published: June 14, 2011
Copyright: © 2011 Schur et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: NR and EH are grateful for the financial support of the EU-funded CONTRAST project (www.eu-contrast.eu/). This investigation received further financial support from the Swiss National Science Foundation for JU (project no. PPOOB-102883 and PPOOB-119129) and PV (project no. 325200-118379). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Schistosomiasis is a water-based disease caused by trematodes of the genus Schistosoma. The five schistosome species that are known to infect humans are Schistosoma mansoni, S. haematobium, S. intercalatum, S. mekongi, and S. japonicum. School-aged children are at highest risk of infection and are the main target group for interventions .
Despite successful efforts to control schistosomiasis in different parts of the world, more than 200 million individuals are still estimated to be infected and the annual global burden due to schistosomiasis might exceed 4.5 million disability-adjusted life years (DALYs) lost –. A substantial amount of this burden is concentrated in West Africa, including Cameroon. Indeed, 72 million infections are thought to occur in this part of the world . However, the current statistics, as presented by Chitsulo et al. (2000) , Steinmann et al. (2006) , and Utzinger et al. (2009) , are largely based on population re-adjusted data originally published by Utroska and colleagues in the late 1980s . Hence, the estimates are likely to be outdated due to, among other reasons, large-scale preventive chemotherapy campaigns, improved sanitation, water resources development and management, and socio-economic development.
Recently, donors have provided new funds to control the so-called neglected tropical diseases (NTDs), including schistosomiasis. For cost-effective planning and evaluation of control activities, it is essential to have reliable baseline maps of the geographical distribution of at-risk population and disease burden. Early schistosomiasis mapping efforts have been based on climatic suitability thresholds , . These maps are not reliable because they are not based on disease data. Apart from a few studies –, empirical maps of disease distribution over large areas are not available since there is a paucity of contemporary large-scale survey data.
The first comprehensive compilation of historical schistosomiasis prevalence surveys at a global scale was carried out by Doumenge et al. in the mid-1980s . More recent collections are available by Brooker et al. (2010)  for soil-transmitted helminthiasis and schistosomiasis, but data access is limited. The European Union (EU)-funded CONTRAST project initiated the development of an open-access global NTD database, which is updated in real time (GNTD database; http://www.gntd.org) . A key objective of CONTRAST is to employ this database for large-scale schistosomiasis prevalence mapping and prediction in sub-Saharan Africa for the spatial refinement of control interventions and the cost-effective allocation of resources.
Geographical locations in close proximity share common exposures which influence the disease outcome similarly. The geographical information of the survey locations in the GNTD database allows taking into account the potential spatial correlation and therefore creation of more realistic models. Standard statistical modeling approaches assume independence between locations . Ignoring potential spatial correlation in neighboring areas due to common exposures could result in incorrect model estimates . Geostatistical models take into account spatial clustering by introducing location-specific random effect parameters in the covariance matrix by a function of distance between locations . Such models typically contain large numbers of parameters and cannot be estimated by the commonly used maximum likelihood approaches . Bayesian model formulations enable model fit via Markov chain Monte Carlo (MCMC) simulations .
Bayesian geostatistical models have been applied in mapping schistosomiasis at different spatial scales, for example by Raso et al. (2005)  in the region of Man, western Côte d'Ivoire, and Clements et al. (2008)  in Mali, Niger, and Burkina Faso. Brooker et al. (2010)  developed a global predictive map highlighting those areas where preventive chemotherapy against schistosomiasis and soil-transmitted helminthiasis are warrant. However, to our knowledge, there is neither a model-based S. haematobium nor a S. mansoni large-scale prevalence map and spatially explicit burden estimates for the whole West African region.
In this paper, we developed Bayesian geostatistical models based on environmental and climatic risk factors to obtain reliable empirical schistosomiasis prevalence maps for individuals aged ≤20 years by analyzing the GNTD data for West Africa, including Cameroon. Prediction was based on joint kriging in order to summarize the results as population-adjusted country prevalence estimates. Emphasis was placed on the distribution of S. haematobium and S. mansoni. We neglected S. intercalatum due to low infection risks, especially outside Cameroon.
The GNTD database was used to obtain prevalence data on schistosomiasis. This database assembles general information about the type of publication, authors, and publication year, as well as study-specific information about survey population, survey period, Schistosoma species, diagnostic test employed, and the number of infected individuals among those examined, stratified by age and sex (if available). Hospital studies, data on specific susceptible groups (such as HIV positives), and post-intervention studies were not included in the database . For this study, we analyzed all point-level data on settled populations in West Africa on either S. haematobium or S. mansoni: 4550 and 2611 survey locations, respectively. We excluded (i) surveys with missing geographical coordinates; (ii) missing numbers of individuals screened; (iii) surveys carried out before 1980; (iv) individuals aged >20 years; and (v) entries based on certain diagnostic techniques. With regard to the latter exclusion criteria, we rejected all non-direct diagnostic examination techniques, such as immunofluorescence tests, antigen detections or questionnaire data, and direct fecal smears that have very low diagnostic sensitivities (overall, 4% of the data for S. mansoni and 0.1% for S. haematobium were excluded). Hence, the surveys included were mainly based on the Kato-Katz thick smear method (S. mansoni) and urine filtration or sedimentation (S. haematobium). Sensitivity and specificity of the diagnostic techniques were not incorporated in the model due to usually unknown sampling effort (e.g., number of stool samples, number of slides examined under a microscope, etc.), which affect diagnostic accuracy.
We assumed that the proportion of rejected diagnostic techniques among the data with missing information on the technique (S. mansoni: 33.5% missing, S. haematobium: 20.6% missing) is similar. Therefore, we considered the bias that would arise from ignoring the missing data as larger than the bias from potentially rejected diagnostic techniques among the missing data. A separate model validation on the reduced datasets confirmed that by including data with incomplete records the predictive ability increased compared to the model excluding this information (results not presented).
Climatic, environmental, and population data
Climatic, environmental, and population data were obtained from different freely accessible remote sensing data sources, as summarized in Table 1. Data on day and night temperature were extracted from land surface temperature (LST) data. The normalized difference vegetation index (NDVI) was used as a proxy for vegetation. Digitized maps on freshwater body sources (e.g., rivers, lakes, and wetlands) in West Africa were acquired with the characteristic of being either perennial or temporary.
Processing of the MODIS/Terra data was carried out using the ‘MODIS Reprojection Tool’  and code implemented in Fortran 90  to summarize the temporal changes by an overall yearly average based either on the mean (NDVI, day and night LST) or the mode (land cover). Furthermore, the land cover categories, as defined by the International Geosphere-Biosphere Programme, were re-grouped into six categories as follows: (i) sparsely vegetated; (ii) deciduous forest and savanna; (iii) evergreen forest; (iv) cropland; (v) urban; and (vi) wet areas. Rainfall estimates were processed via the software IDIRSI 32 . Yearly averaged rainfall was calculated as summary measure. Distance calculations to the nearest freshwater body source were done in ArcMap version 9.2 of the Environmental Systems Research Institute (ESRI; Redlands, CA, USA) .
A classification scheme of West Africa into ecological zones was obtained using a demo version of the Earth Resources Data Analysis System Imagine 9.3 software . The datasets were subjected to an unsupervised classification, via the ‘Iterative Self-Organizing Data Analysis Technique’ (ISODATA), to map areas of environmental clustering which were further summarized into five main classes based on between-class similarities. The resulting map matched existing classifications  and the classes can be interpreted as (i) desert/semi-desert; (ii) sahelian zone; (iii) savannah; (iv) forest; and (v) tropical rainforest.
Population count data obtained from LandScan for 2008 were converted to 5×5 km spatial resolution and adjusted to 2010 using country-specific average annual rates of change for 2005–2010 provided by the United Nations (UN) . Estimates for the percentage of individuals aged ≤20 years among the total population per country were extracted from the U.S. Census Bureau International Database  for the year 2010. Population counts were linked to the percentage of children. The estimated number of infected individuals ≤20 years was calculated by combining a sample of the joint predictive posterior distribution of the disease prevalence predicted at pixel level with the population size of that age group within the pixel. The predictive posterior distribution of the number of infected individuals per country was estimated by summing up the pixel-samples and calculating summary statistics. The combined schistosomiasis prevalence (infection with S. mansoni or S. haematobium or both) was calculated on the assumption that the two infections are independent from each other, as Schistosoma spp. = S. mansoni+S. haematobium−(S. mansoni * S. haematobium).
Extraction of the remotely sensed data at the survey locations and at the prediction locations for the two databases was performed via a self written Fortran 90 code. The prediction surface for West Africa was built in ArcMap  with a spatial resolution of 0.05°×0.05° (approximately 5×5 km) resulting in approximately 220,000 pixels covering the study region. The data were displayed in ArcMap.
For each Schistosoma species, bivariate logistic regressions were performed in STATA/IC 10.1  in order to assess potential covariates in relation to the outcome (the number of infected individuals over the number of individuals screened per location). Continuous covariates were categorized into four groups based on quartiles to account for potential non-linearity in the outcome-predictor relationship on the logit. The Bayesian information criterion (BIC) was employed to detect whether linear or categorized covariates on the logits have smaller BIC and therefore predict the outcome more accurately. We used the following covariates in both linear and categorical scales: altitude, day LST, night LST, rainfall, NDVI, and distance to the nearest freshwater body. The type of freshwater body, ecological zone, and land cover were measured in categorical dimensions.
The study year was also included as linear and categorical covariate in order to account for possible temporal trends. The categories were defined on decades as follows: 1980–1989, 1990–1999, and from 2000 onwards. For S. mansoni, half of the data were from the 1980s (49.7%), 24.1% from the 1990s, whereas 26.2% were obtained in the new millennium. For S. haematobium, 37.8% of the data stem from the 1980s, 35.7% from the 1990s, and 26.5% from 2000 onwards.
Relevance of continuous or categorized covariates to predict the outcome was assessed based on p-values resulting from likelihood ratio tests (LRTs) at significance levels of 0.15. All significant covariates were included in the Bayesian analysis.
Bayesian geostatistical logistic regression models were fitted with location-specific random effects. Spatial correlation was modeled assuming that the random effects follow a multivariate normal distribution with variance-covariance matrix related to an exponential correlation function between any pair of locations. Model fit requires the inversion of this matrix. Due to the large number of survey locations in our datasets, parameter estimation becomes unfeasible. An approximation of the spatial process by a subset of survey locations () proposed by Banerjee et al. (2008)  and further developed by Gosoniu et al. (forthcoming)  and Rumisha et al. (forthcoming)  was implemented instead. We employed MCMC simulation to estimate the model parameters. Prevalence of infection at 220,000 locations was predicted for the most recent decade (from the year 2000 onwards) via Bayesian kriging using joint predictive posterior distributions . Due to computational issues, we modeled the multivariate Gaussian spatial process separately for each country. The performance of the models was assessed using model validation via different approaches: mean predictive errors (ME), mean absolute predictive errors (MAE), discriminatory performance on a 50% prevalence cut-off, and Bayesian credible interval (BCI) comparisons . Further details pertaining to the Bayesian geostatistical model, sub-sampling, and model validation approaches are given in the Appendix S1.
Final datasets and preliminary statistics
A schematic overview of the study profile on obtaining prevalence data on schistosomiasis from the GNTD is given in Figure 1. The final datasets consisted of 1993 and 1179 survey locations for S. haematobium and S. mansoni, respectively, out of which 1722 and 1094 locations were unique. Observed prevalence of the survey locations ranged from 0% to 100% for each Schistosoma species with mean prevalence of 31.0% (median 15.0%, standard deviation (SD) 29.0%) for S. haematobium, and 17.7% (median 0.0%, SD 24.4%) for S. mansoni. The distribution and the prevalence level of the survey locations are shown in Figures 2 and 3 for S. haematobium and S. mansoni, respectively. An overview of the number of surveys with details given regarding sampling period, diagnostic technique, survey type, and mean prevalence, stratified by country, is given in Table 2.
Schematic overview of the study profiling process. The numbers in brackets in the acute-angled boxes represent the number of survey locations (which may not be unique) included in the current GNTD dataset, while the numbers outside the boxes represent the amount of survey dropped due to the reason given in the boxes with rounded corners.
Observed prevalence of S. haematobium among individuals aged ≤20 years across West Africa, including Cameroon.
Observed prevalence of S. mansoni among individuals aged ≤20 years across West Africa, including Cameroon.
Spatial distributions of potential covariates influencing the distribution of schistosomiasis are presented in Figure 4. Bivariate logistic regressions of the continuous factors in relation to the disease outcomes showed that categorical variables predicted better based on BIC values than linear variables for both Schistosoma species (results not presented). Each potential covariate considered for the analyses had a p-value of <0.001 based on LRTs and was therefore included in the multivariate analyses. Backwards logistic regressions demonstrated the importance of the whole set of covariates for each species. The resulting odds ratios (ORs) of bivariate and multivariate non-spatial logistic regressions are summarized in Table 3 for S. haematobium, and Table 4 for S. mansoni. The only non-significant outcome-predictor relations in a multivariate framework for the former species were yearly averaged precipitation between 300 mm and 399 mm, and NDVI levels between 0.33 and 0.52. For the latter species, only altitude levels of at least 500 m above sea level and night LSTs between 20.0°C and 20.7°C were non-significant.
Spatial distribution of remotely sensed covariates for West Africa, including Cameroon. Climatic covariates were summarized via yearly averages.
Spatial modeling outcomes
Model parameter estimates for S. haematobium and S. mansoni are presented in Table 3 and Table 4, respectively. Introduction of spatial correlation led to changes in the significance of covariates and the direction of outcome-predictor relations compared to the corresponding non-spatial multivariate logistic regression models. For example, the influence of rainfall for S. mansoni became more important while the effect of the survey period and non-perennial freshwater bodies was reduced. The spatial range was estimated to be 398 km (95% BCI: 384–412 km) and 387 km (95% BCI: 375–402 km) for S. haematobium and S. mansoni, respectively. These estimates suggest strong spatial correlation for both species. The spatial variation was similar for the two species (4.02 for S. haematobium vs. 4.05 for S. mansoni).
Schistosomiasis prevalence maps
Figure 5A presents the prevalence map for S. haematobium based on the median of the predictions. Low-prevalence areas (predicted infection prevalence <10%) were primarily observed in the Sahara, Cameroon, north-west Côte d'Ivoire, and Senegal. Prevalence >50% are mainly spread along the Niger River, in Sierra Leone, east/central Senegal, and south Nigeria. The map of the SD of model predictions for this species (Figure 5B) demonstrates that small prediction errors were primarily found around the survey locations used for sub-sampling.
(A) Predicted median of prevalence for S. haematobium among individuals aged ≤20 years during the period of 2000–2009 based on Bayesian kriging, and (B) standard deviation (SD) of the prediction error with sub-sampled survey locations.
The median spatial S. mansoni prevalence map is shown in Figure 6A with the corresponding error presented in Figure 6B. High-prevalence areas (predicted prevalence >50%) were mainly found in north-east Liberia, east Côte d'Ivoire, west Ghana, north/central Benin, west Nigeria, north Cameroon, and central Mali in close proximity to Niger River. Very low prevalence areas (predicted prevalence <10%) were predominant in Senegal, The Gambia, Guinea-Bissau, Mauritania, and Niger. Furthermore, low prevalence areas were predicted for north Mali, south Togo, and parts of Cameroon. Areas of high prediction accuracy were found around the sub-sampled survey locations and in desert/semi-desert ecological zones.
(A) Predicted median of prevalence for S. mansoni among individuals aged ≤20 years during the period of 2000–2009 based on Bayesian kriging, and (B) standard deviation (SD) of the prediction error with sub-sampled survey locations.
At-risk population estimates
Table 5 shows population-adjusted country prevalence estimates. For S. haematobium, prevalence estimates range between 17.6% (The Gambia) and 51.6% (Sierra Leone), whereas for S. mansoni they range between 0.5% (The Gambia) and 37.1% (Liberia). S. haematobium was found to be the predominant species throughout West Africa with a difference compared to S. mansoni of up to 30% in Burkina Faso and a minimum difference of about 4% in Liberia. Combined Schistosoma prevalence estimates, assuming independence of the occurrence of the two species, varied from 18.1% (The Gambia) to 58.3% (Liberia) with high numbers of infected individuals aged ≤20 years (more than 5 million) in Ghana and Nigeria. Lower numbers (<1 million) of infected individuals aged ≤20 years were found in The Gambia, Guinea-Bissau, Liberia, and Mauritania. The overall number of infected individuals aged ≤20 years in West Africa is 50.8 million.
Model validation results
Model validation based on 80% of the survey locations resulted in MEs of −1.7 for S. haematobium and 0.0 for S. mansoni, and respective MAEs of 19.5 and 7.3. The percentage of test locations correctly predicted by 95% BCIs was 72.9% for S. haematobium, and 72.5% for S. mansoni. ME and MAE comparisons between spatial and exchangeable random effect models showed that spatial models result in better predictive ability (S. haematobium: ME = 3.8, MAE = 27.7; S. mansoni: ME = −0.8, MAE = 14.9).
Discriminatory performance based on a 50% prevalence cut-off showed that the models correctly predicted 93.2% and 76.9% of the validation locations for S. mansoni and S. haematobium, respectively. False-high predictions were obtained for 5.5% (S. mansoni) and 18.8% (S. haematobium) of the test locations.
To our knowledge, we provide the first model-based prevalence maps for both S. haematobium and S. mansoni for individuals aged ≤20 years in West Africa, including Cameroon. We used a readily available open-access database consisting of a large number of historical and contemporary geolocated and standardized survey data , coupled with Bayesian-based geostatistical tools. Standard geostatistical methods are not able to handle large numbers of survey locations due to computational problems. Therefore, for the first time, an approximation of the spatial process was implemented in Schistosoma prevalence modeling.
In comparison to existing prevalence estimates, major shortcomings of previous studies have been addressed, and hence our prevalence maps show a higher spatial resolution and we believe that they are more accurate than heretofore. This claim is justified as follows. First, our estimates are based on the GNTD database that has gone live in July 2010, developed as part of the EU-funded CONTRAST project. As of February 2010, the GNTD contained more than 4500 and 2600 unique entries in West Africa for S. haematobium and S. mansoni, respectively. Second, data-tailored statistical methods based on Bayesian geostatistical modeling were used in order to incorporate spatial correlation between survey locations and to obtain more accurate estimates of the uncertainty of the predictions. Third, climatic and environmental covariates were employed in the models to evaluate the effect on the disease outcomes. The climatic and environmental factors were obtained at high spatial resolution to be able to predict small hotspots of risk, which could arise due to the focal distribution of schistosomiasis, which is an important epidemiological feature of the disease . An existing S. haematobium prevalence map for three West African countries (i.e., Burkina Faso, Mali, and Niger) using Bayesian geostatistical modeling was previously presented by Clements et al. (2008)  based on data from 2004–2006. However, this map does not show the actual level of schistosomiasis prevalence but rather probabilities that the predicted prevalence is above a pre-defined cut-off, arbitrarily set at 50%. This cut-off has been proposed by the World Health Organization (WHO)  to distinguish between low and high risk areas, and hence such maps are useful to detect areas where preventive chemotherapy might be warranted on an annual basis. However, the maps do not provide detailed information for lower risk areas or the number of infected individuals and they cannot be used for monitoring and evaluation purposes following interventions. A more recent publication by Clements et al. (2009)  presented a S. haematobium prevalence map for the same three West African countries. This map shows similar patterns to our map with the exception of north Burkina Faso. In this area, Clements and colleagues predicted prevalence levels of 10–20% for high and low egg-intensities, while our estimates suggest much higher prevalence (>50%). These discrepancies are most likely due to differences in the underlying survey data. The Clements et al. data were only partially included in the GNTD database as we could not access them fully.
The estimated spatial correlation for both Schistosoma species was very strong with spatial ranges of approximately 400 km. Previously reported spatial ranges in parts of West Africa vary between 7.5 km  and approximately 180 km . However, these estimates were based on recent surveys, and hence influenced by recently established control programs. Interventions are likely to reduce the predictive power of environmental and climatic factors on the distribution of schistosomiasis and, thus, reduce spatial correlation. Similar effects were found for malaria, where historic data showed stronger spatial correlation  than recent surveys , .
We overlaid population data adjusted to 2010 on the predicted prevalence surfaces for the two Schistosoma species in order to obtain country-specific estimates of the number of infected individuals aged ≤20 years. Previous country estimates, for instance those presented by Chitsulo et al. (2000) , Steinmann et al. (2006) , or Utzinger et al. (2009) , are interpolations of limited observations for a whole country, and hence lack empirical modeling. Chitsulo and colleagues reported a higher number of infected people for West Africa (71.8 million) compared to our estimate (50.8 million). Of note, the Chitsulo et al. estimates are based on the whole population, while our new estimates concern the age group ≤20 years. Moreover, the Chitsulo et al. estimates pertain to mid-1990s population estimates, compared to our adjusted estimates for the year 2010. In countries like Cameroon, The Gambia, Ghana, and Liberia, characterized by high rural-to-urban migration in the last decade, the Chitsulo et al. prevalence estimates should be treated with care due to rapid urbanization. Our study revealed that the combined prevalence of S. haematobium and S. mansoni in The Gambia, for example, is two-fold lower than previously reported by Chitsulo et al. (18.1% vs. 37.5%). However, in Benin, Guinea, Liberia, Nigeria, and Togo, we found prevalence estimates that are more than 10 percentage points higher than the previous estimates. On the one hand, differences might be related to sparse data, for example, in Benin, The Gambia, Guinea, Guinea-Bissau, Liberia, Mauritania, Nigeria, and Sierra Leone. Previous estimates failed to take into account model-based predictions on the basis of climate, environment and disease data. Since we modeled disease prevalence on individuals aged ≤20 years (highest risk groups), the prevalence estimates correspond to the former risk group. Therefore they are likely to overestimate the prevalence in the whole population.
We estimated the country-specific overall schistosomiasis prevalence by assuming independence between the occurrence of S. haematobium and S. mansoni in each area. However, it is conceivable that simultaneous infections with both species is more frequent than expected by chance in areas where the species co-exist as infection pathways are similar and highly behavioral related. Hence, the combined prevalence estimates potentially underestimate the true schistosomiasis situation in West Africa. A modeling approach via joint spatial random effects  could assess the effect of potential dependence between the species, but would increase the number of spatial parameters and is therefore computationally challenging.
We might also underestimate schistosomiasis prevalence in Cameroon, Mali, and Nigeria because of the presence of S. intercalatum . We did not include this species in the analysis since the GNTD database currently only contains 17 survey locations outside Cameroon. However, it is assumed that S. intercalatum has a low prevalence  and there are signs that this species is further declining in importance .
Model validation has shown that the S. haematobium predictions seem to overestimate the actual prevalence, while the S. mansoni model revealed no tendency to over- or underestimate the overall prevalence. The MAE for the S. haematobium model is nearly three times larger than the one for S. mansoni. This is expected because the mean prevalence for S. haematobium was about double than that for S. mansoni. Our models correctly predict about 72% of the survey locations when considering 95% BCIs. We are encouraged by these results, since perfect predictions are rather unlikely in reality due to the complexity of disease transmission.
However, our models are based on assumptions, which could influence model performance. We assumed that the diagnostic techniques employed have similar ability to detect an infection, but different diagnostic techniques show differences in sensitivity and specificity, which also depends on the overall prevalence and infection intensity . This might have led to an underestimation of prevalence due to the imperfect sensitivity of direct diagnostic techniques . Additional model parameters accounting for the performance of the different diagnostic techniques could be incorporated in the models. However in the absence of detailed information regarding sampling effort, assumptions would be required which may be debatable and introduce additional biases. We are currently examining the effect of different approaches on addressing this issue on the model-based predictions.
We did not adjust the outcome according to age and sex even though the age groups differ and especially school surveys are likely to include more boys than girls due to prevailing cultural issues in many parts of West Africa. Therefore, our results are likely to be biased and potentially overestimate schistosome prevalence. However, many publications do not present stratified results by these subgroups. Age-adjustment models are feasible but difficult to implement because age-prevalence curves have to be fitted for different transmission settings . Furthermore, disease data are often reported at wide age ranges (i.e., school-aged children) and individuals might not be well distributed within the range introducing bias even though an age-prevalence model is taken into account.
Surveys are typically conducted in endemic areas leading to high observed prevalence levels. This could result in an overestimation of prevalence in the present analysis. However, in the data we analyzed, 45% of the locations for S. haematobium and 73% for S. mansoni had an observed prevalence levels below 10%. We therefore assume that a location selection bias is unlikely. Another concern is the large amount of zero outcomes (i.e., none of the study participants found to be infected) especially for S. mansoni (S. mansoni: 54.1%; S. haematobium: 20.1%). To overcome this issue, zero-inflated models need to be incorporated, which modify the likelihood function and add an additional model parameter capturing the over-dispersion arising by the zeros .
The models presented in this manuscript did only include spatial random errors, and hence we ignored potential measurement errors. Inclusion of location-specific non-spatial error terms might have improved model predictions. However, location-specific non-spatial error terms would have doubled the number of error terms leading to highly parameterized models.
We further assumed isotropic stationary models. Non-stationary models imply that the spatial random effect is varying from one region to another and is not stable throughout the study area . This assumption has been confirmed by semi-variogram comparisons showing that the estimated spatial range parameters for S. mansoni differ between eco-zones. However, semi-variogram analyses did not indicate non-stationarity in the spatial distribution of S. haematobium. Isotropic models assume that the spatial correlation is the same within the same distance irrespective of direction . This assumption might not be valid since intermediate host snails spread along rivers and lakeshores and, therefore, introduce correlation attributed to directions.
The choice and size of sub-sampled locations required to adequately approximate the spatial Gaussian process is a research area on its own in spatial statistics. Many different approaches are available to optimize selection. We implemented a method based on semi-variogram comparisons. This selection is aiming to preserve the spatial surface of the original dataset. However, it might fail to identify a sub-sample, which minimizes the prediction error. The spatially averaged predictive variance (SAPV) method proposed by Finley is trying to optimize the variance in the predictions, but implementation is computationally highly demanding .
Time-dependent covariates, such as the climatic factors, might have changed between the 1980s and the 2000s. However, our geographical covariates were solely based on recent remote sensing data (from 2000 onwards), because historical remote sensing data are, to our knowledge, not freely available at high spatial and temporal resolution. The long run averages of the recent data enable us to maintain high spatial resolution although they cannot capture variation in the observed outcome due to unusual climatic conditions or climate change that might have occurred since the 1980s and 1990s.
Preliminary residual analyses suggest that there is only weak temporal correlation in the data. We therefore only modeled a spatial rather than a spatio-temporal process. This led to a more parsimonious model and facilitated model fit. Nevertheless, we incorporated temporal trends in the prevalence estimation by including the survey year as covariate. Both Schistosoma species showed that the predicted prevalence was highest during the 1990s. This increase might be explained by water resources development and management activities (e.g., the construction of dams and irrigation systems), political unrests and civil restructuring. Water resources development and management projects might have improved the suitability of the environment for snail intermediate hosts that might have spread into previously snail-free zones together with the parasites. Since the beginning of the new millennium, a number of large-scale preventive chemotherapy programs are underway in parts of West Africa and it will be important to monitor how the prevalence of schistosomiasis changes in space and over time. The effectiveness of control interventions may vary across areas but, to our knowledge, a comprehensive database compiling this information with high spatio-temporal resolution has yet to be established.
Concluding, our country-specific Schistosoma prevalence estimates and numbers of individuals aged ≤20 years infected with either S. mansoni, or S. haematobium, or both species concurrently presented here are useful tools for disease control managers and other stakeholders to support decision-making on interventions. Our maps can also serve as a benchmark to monitor the impact of control interventions and for long-term evaluation on transmission dynamics. Model-based estimates in areas with scarce data and high uncertainty could be improved by additional surveys to enhance our knowledge on the distribution of schistosomiasis and disease burden. We plan to further expand this work to other regions and address the issues of non-stationarity, diagnostic sensitivity, and age-heterogeneity across surveys. Finally, we will test the assumption of independence between the Schistosoma species to improve accuracy of the joint prevalence estimates.
Geostatistische modellbasierte Abschätzungen zur Häufigkeit von Schistosomiasis in Westafrika für Personen im Alter von maximal 20 Jahren - Translation of abstract into German by Nadine Schur.
Supporting information on geostatistical model formulation, spatial process approximation and model validation.
Many thanks are addressed to Dr. Anna-Sofie Stensgaard for her work related to the GNTD database. Special thanks go to Mr. Dominic Gosoniu and Ms. Susan Rumisha for further development and implementation of the spatial process approximation to handle large datasets. We are also grateful to all our collaborators from Benin, Burkina Faso, Cameroon, Côte d'Ivoire, The Gambia, Ghana, Guinea, Liberia, Mali, Mauritania, Niger, Nigeria, Senegal, and Togo who contributed geolocated schistosomiasis survey data for the GNTD database.
Analyzed the data: NS PV. Wrote the paper: NS PV JU. Conceptualized the project: PV JU TKK. Provided substantial amount of data: AG MST ON RCR LATT. Processed data: NS EH. Provided important intellectual content: AG MST ON RCR LATT TKK JU PV.
- 1. WHO (2002) Prevention and control of schistosomiasis and soil-transmitted helminthiasis: report of a WHO expert committee. WHO Tech Rep Ser 912: 1–57.
- 2. Steinmann P, Keiser J, Bos R, Tanner M, Utzinger J (2006) Schistosomiasis and water resources development: systematic review, meta-analysis, and estimates of people at risk. Lancet Infect Dis 6: 411–425.
- 3. King CH, Dickman K, Tisch DJ (2005) Reassessment of the cost of chronic helmintic infection: a meta-analysis of disability-related outcomes in endemic schistosomiasis. Lancet 365: 1561–1569.
- 4. Chitsulo L, Engels D, Montresor A, Savioli L (2000) The global status of schistosomiasis and its control. Acta Trop 77: 41–51.
- 5. Utzinger J, Raso G, Brooker S, de Savigny D, Tanner M, et al. (2009) Schistosomiasis and neglected tropical diseases: towards integrated and sustainable control and a word of caution. Parasitology 136: 1859–1874.
- 6. Utroska JA, Chen M, Dixon H, Yoon S, Helling-Borda M, et al. (1989) An estimate of the global needs for praziquantel within schistosomiasis control programmes. Geneva: World Health Organization.
- 7. Malone JB, Yilma JM, McCarroll JC, Erko B, Mukaratirwa S, et al. (2001) Satellite climatology and the environmental risk of Schistosoma mansoni in Ethiopia and East Africa. Acta Trop 79: 59–72.
- 8. Bavia ME, Malone JB, Hale L, Dantas A, Marroni L, et al. (2001) Use of thermal and vegetation index data from earth observing satellites to evaluate the risk of schistosomiasis in Bahia, Brazil. Acta Trop 79: 79–85.
- 9. Clements ACA, Firth S, Dembelé R, Garba A, Touré S, et al. (2009) Use of Bayesian geostatistical prediction to estimate local variations in Schistosoma haematobium infection in western Africa. Bull World Health Organ 87: 921–929.
- 10. Brooker S, Donnelly CA, Guyatt HL (2000) Estimating the number of helminthic infections in the Republic of Cameroon from data on infection prevalence in schoolchildren. Bull World Health Organ 78: 1456–1465.
- 11. Brooker S, Hay SI, Issae W, Hall A, Kihamia CM, et al. (2001) Predicting the distribution of urinary schistosomiasis in Tanzania using satellite sensor data. Trop Med Int Health 6: 998–1007.
- 12. Clements ACA, Bosqué-Oliva E, Sacko M, Landouré A, Dembélé R, et al. (2009) A comparative study of the spatial distribution of schistosomiasis in Mali in 1984–1989 and 2004–2006. PLoS Negl Trop Dis 3: e431.
- 13. Doumenge JP, Mott KE, Cheung C, Villenave D, Chapuis O, et al. (1987) Atlas of the global distribution of schistosomiasis. Bordeaux: WHO-CEGET-CNRS, Presses Universitaires de Bordeaux.
- 14. Brooker S, Hotez PJ, Bundy DAP (2010) The global atlas of helminth infection: mapping the way forward in neglected tropical disease control. PLoS Negl Trop Dis 4: e779.
- 15. Hürlimann E, Schur N, Boutsika K, Stensgaard AS, Laizer N, et al. (2011) Toward an open-access, real-time global database for mapping, control, and surveillance of neglected tropical diseases. PLoS Negl Trop Dis (under review).
- 16. Diggle PJ, Tawn JA, Moyeed RA (1998) Model-based geostatistics. Appl Stat 47: 299–350.
- 17. Gosoniu L, Vounatsou P, Sogoba N, Smith T (2006) Bayesian modelling of geostatistical malaria risk data. Geospat Health 1: 127–139.
- 18. Kleinschmidt I, Bagayoko M, Clarke G, Craig M, Le Sueur D (2000) A spatial statistical approach to malaria mapping. Int J Epidemiol 29: 355–361.
- 19. Raso G, Matthys B, N'Goran EK, Tanner M, Vounatsou P, et al. (2005) Spatial risk prediction and mapping of Schistosoma mansoni infections among schoolchildren living in western Côte d'Ivoire. Parasitology 131: 97–108.
- 20. U.S. Geological Survey (USGS) MODIS Reprojection Tool. Available at: http://lpdaac.usgs.gov/landdaac/tools/modis/index.asp. Accessed: 25 December 2008.
- 21. Digital Equipment Corporation Visual Fortran Version 6.0. Available at: www.fortran.com. Accessed: 14 August 2008.
- 22. Clark Labs IDRISI 32. Worcester, MA, USA: Clarks University.. Available at: www.clarklabs.org. Accessed: 22 August 2008.
- 23. Environmental Systems Research Institute (ESRI) ArcMap v. 9.1. Redlands, CA, USA: Environmental Systems Research Institute.. Available at: www.esri.com. Accessed: 22 August 2008.
- 24. Earth Resources Data Analysis System (ERDAS) Imagine 9.3. Atlanta, USA: ERDAS Inc.. Available at: http://www.erdas.com/. Accessed: 5 January 2009.
- 25. Food and Agriculture Organization (FAO) Global Agro-Ecological Zones (GAEZ). Available at: http://www.iiasa.ac.at/Research/LUC/GAEZ/index.htm. Accessed: 6 May 2010.
- 26. United Nations, Department of Economics and Social Affairs, Population Division (2007) World Population Prospects: The 2006 Revision, Highlights. Working Paper No. ESA/P/WP.202.
- 27. U.S. Census Bureau, Population Division International Data Base (IDB) - Tables. Available at : http://www.census.gov/ipc/www/idb/tables.html. Accessed: 9 July 2008.
- 28. StataCorp LP STATA/SE v.9.2. Available at : www.stata.com. Accessed: 14 August 2008.
- 29. Banerjee S, Gelfand AE, Finley AO, Sang H (2008) Gaussian predictive process models for large spatial data sets. J R Stat Soc Series B Stat Methodol 70: 825–848.
- 30. Gosoniu D, Gosoniu L, Tille Y, Vounatsou P (2011) Subsampling the Gaussian process of very large geostatistical data - does the sampling approach matter? Comput Stat Data Anal (under review).
- 31. Rumisha SF, Gosoniu D, Kasasa S, Smith TA, Abdulla S, et al. (2011) Bayesian modeling of large geostatistical data to estimate seasonal and spatial variation of sporozoite rate. Stat Methods Appt (under review).
- 32. Lengeler C, Utzinger J, Tanner M (2002) Questionnaires for rapid screening of schistosomiasis in sub-Saharan Africa. Bull World Health Organ 80: 235–242.
- 33. Clements ACA, Garba A, Sacko M, Touré S, Dembelé R, et al. (2008) Mapping the probability of schistosomiasis and associated uncertainty, West Africa. Emerg Infect Dis 14: 1629–1632.
- 34. Gemperli A, Sogoba N, Fondjo E, Mabaso M, Bagayoko M, et al. (2006) Mapping malaria transmission in West and Central Africa. Trop Med Int Health 11: 1032–1046.
- 35. Gosoniu L, Vounatsou P, Sogoba N, Smith T (2011) Mapping malaria risk in West Africa using a Bayesian nonparametric non-stationary model. Commun Stat Theory Methods (in press).
- 36. Riedel N, Vounatsou P, Miller JM, Gosoniu L, Chizema-Kawesha E, et al. (2010) Geographical patterns and predictors of malaria risk in Zambia: Bayesian geostatistical modelling of the 2006 Zambia national malaria indicator survey (ZMIS). Malar J 9: 37.
- 37. Schur N, Gosoniu L, Raso G, Utzinger J, Vounatsou P (2011) Modelling the geographical distribution of co-infection risk from single disease survey data. Stat Med (in press).. DOI:10.1002/sim.4243.
- 38. Tchuem Tchuenté LA, Southgate VR, Jourdane J, Webster BL, Vercruysse J (2003) Schistosoma intercalatum: an endangered species in Cameroon? Trends Parasitol 19: 389–393.
- 39. Bergquist R, Johansen MV, Utzinger J (2009) Diagnostic dilemmas in helminthology: what tools to use and when? Trends Parasitol 25: 151–156.
- 40. Gemperli A, Vounatsou P, Sogoba N, Smith T (2006) Malaria mapping using transmission models: application to survey data from Mali. Am J Epidemiol 163: 289–297.
- 41. Vounatsou P, Raso G, Tanner M, N'Goran EK, Utzinger J (2009) Bayesian geostatistical modelling for mapping schistosomiasis transmission. Parasitology 136: 1695–1705.
- 42. Ecker MD, Gelfand AE (2003) Spatial modeling and prediction under stationary non-geometric range anisotropy. Environ Ecol Stat 10: 165–178.
- 43. Gosoniu GD, Vounatsou P, Kahn K, Tillé Y (2011) Geostatistical modeling of large non-Gaussian irregularly distributed data. Computation Stat (under review).