High-Resolution Spatial Distribution and Estimation of Access to Improved Sanitation in Kenya

Background Access to sanitation facilities is imperative in reducing the risk of multiple adverse health outcomes. A distinct disparity in sanitation exists among different wealth levels in many low-income countries, which may hinder the progress across each of the Millennium Development Goals. Methods The surveyed households in 397 clusters from 2008–2009 Kenya Demographic and Health Surveys were divided into five wealth quintiles based on their national asset scores. A series of spatial analysis methods including excess risk, local spatial autocorrelation, and spatial interpolation were applied to observe disparities in coverage of improved sanitation among different wealth categories. The total number of the population with improved sanitation was estimated by interpolating, time-adjusting, and multiplying the surveyed coverage rates by high-resolution population grids. A comparison was then made with the annual estimates from United Nations Population Division and World Health Organization /United Nations Children's Fund Joint Monitoring Program for Water Supply and Sanitation. Results The Empirical Bayesian Kriging interpolation produced minimal root mean squared error for all clusters and five quintiles while predicting the raw and spatial coverage rates of improved sanitation. The coverage in southern regions was generally higher than in the north and east, and the coverage in the south decreased from Nairobi in all directions, while Nyanza and North Eastern Province had relatively poor coverage. The general clustering trend of high and low sanitation improvement among surveyed clusters was confirmed after spatial smoothing. Conclusions There exists an apparent disparity in sanitation among different wealth categories across Kenya and spatially smoothed coverage rates resulted in a closer estimation of the available statistics than raw coverage rates. Future intervention activities need to be tailored for both different wealth categories and nationally where there are areas of greater needs when resources are limited.


Study Area
As one of the countries in East Africa, Kenya is located on the equator and bordered by Tanzania to the south, Uganda to the west, South Sudan to the northwest, Ethiopia to the north, and Somalia to the northeast. It has a land area of approximately 581,082 km 2 and a population of nearly 40.5 million people according to a 2010 estimate by the World Bank. Following a High Court ruling in September 2009, Kenya is divided into 47 counties that belong to eight provinces (from largest to smallest in area: Rift Valley, Eastern, North Eastern, Coast, Central, Nyanza, Western, and the capital Nairobi). However, the population ranking based on a 2009 Kenya Census is different from the size ranking of the provinces (Table 1)

Data Collection
Household-level data from the 2008-2009 Kenya Demographic and Health Surveys (KDHS) were collected from a representative sample of 10,000 households grouped into clusters randomly selected within a two-stage sampling design [26,27]. Each cluster was represented by a GPS point with latitude and longitude coordinates and defined as either an urban or rural cluster [28]. There are 397 total clusters included in this study after initial data quality control and processing, with 384 clusters including 19-25 households and the remaining 13 clusters including 9-18 households. Among all the surveyed clusters, 132 are urban and 265 are rural clusters. Their actual locations were offset towards a random direction by a maximum of 2 kilometers for urban clusters and a maximum of 5 kilometers for rural clusters, with a further 1% of the rural clusters displaced a maximum of 10 kilometers, to maintain confidentiality of the surveyed respondents.
The AfriPop project, now becoming part of the WorldPop project, is at present the best freely available source of data for population-related studies in Africa and in most cases it has outperformed the African Population Database (APD), Global Rural-Urban Mapping Project (GRUMP), Gridded Population of the World version 3 (GPW3), and LandScan in Kenya [29]. The AfriPop 2010, adjusted to match UN national estimates with a resolution of 100 meters, was used in this study to represent the population density (persons/ha) (Fig 1).

Data Processing
Drinking water supply and sanitation facilities have been included as household assets in the default asset scores and quintiles of DHS datasets [17]. Therefore, an asset score for each household was constructed using Principal Components Analysis (PCA) in Stata 12 [30], on the basis of household ownership of durable goods and housing characteristics excluding sanitation and drinking water supply [31,32]. According to the asset scores, all households in Kenya were ranked and divided into five ordered subgroups called wealth quintiles, where the first quintile included the poorest 20% households and the fifth quintile included the wealthiest 20%.
Five subsets of clusters representing five quintiles were created, each of which included all of the clusters that contained at least one household in the corresponding quintile. It is worth noting that those five subsets were not mutually exclusive. As the surveyed households in one cluster might fall in multiple quintiles but be only represented by one GPS point, a cluster can appear in all subsets only if it included at least one household in the corresponding wealth quintile.
An improved sanitation facility was defined by JMP as one that 'hygienically separates human excreta from human contact', for MDG monitoring [14]. In this study, five types of toilet facilities are defined as improved: (1) Pit latrines with slabs, (2) ventilated improved pit latrines (VIP), (3)(4)(5) toilets that flush or pour to piped sewer systems/septic tanks/pit latrines. Pit latrines without slabs and flush toilets that do not have an outflow to an appropriate site were considered unimproved sanitation. Shared and private facilities were included provided they met the definition of improved facilities. The percentage of households with any of the five improved types in each cluster was calculated as the coverage rate of improved sanitation for the overall population and for each quintile and accounted for the survey design and weights in Stata 12 [30].

Data Analysis
Excess risk in the overall population and in each wealth quintile was assessed where the risk for lack of improved sanitation was higher (excess ratio risk >1) and lower (excess ratio risk <1) than the national average level, and where extremely high risk was present (excess ratio risk >2). The national average level was calculated as the ratio between the total number of households without improved sanitation and the number of all surveyed households, instead of simply using the average of all clusters. The excess risk in each quintile was calculated as: where r cq = excess risk for the cluster c in quintile q, n cq = number of households without improved sanitation in cluster c in quintile q, N cq = total number of households in cluster c in quintile q, n x = total number of households without improved sanitation in cluster x, N x = total number of surveyed households in cluster x, and P represents all 397 clusters in this study.
A Thiessen polygon was created around each cluster, within which any location was closer to the centered cluster than other surrounding clusters. According to Tobler's first law of geography that 'everything is related to everything else, but near things are more related than distant things [33],' spatial smoothing was used to produce a corresponding estimate to the raw coverage rate of each cluster from a collection of neighboring clusters enclosed by Thiessen polygons. The first order rook contiguity was applied as the spatial smoothing rule in this study, meaning that all neighboring polygons sharing a border of some length that is longer than a point-length border, with the target Thiessen polygon were defined as neighbors.
Local Moran's I were calculated using GeoDa [34] to identify clusters of high and low coverage of improved sanitation, where a Random Permutation Procedure (RPP) was additionally used to replicate the statistics 999 times to generate reference distributions. By comparison with reference distributions, the statistical significance of high and low spatial clustering was tested. The Local Moran's I was calculated for both raw and spatially-smoothed rates.

Interpolation
There are two broad classes of interpolation methods, deterministic (for example, inverse distance weighted (IDW) interpolation) and probabilistic (for example, Kriging interpolation), both of which were used in this study to predict values where sampling was not conducted [35]. According to Tobler's first law of geography, the IDW interpolation uses the measured values surrounding each unsampled location, weighted by the proximity to this location, to predict the value.
A semivariogram is a function of the distance and direction separating two locations, which is estimated from the data for quantifying the spatial dependence in the data and assumed to be the same at all locations (spatial homogeneity) in the classical Kriging method. This assumption rarely holds true in practice, so the empirical Bayesian Kriging (EBK) interpolation was used to account for the variance by estimating multiple semivariogram models from the data instead of a standalone semivariogram. The selection of models and parameters in the common Kriging methods, such as ordinary and universal Kriging, was automated in the EBK.
The raw and spatially smoothed coverage rates of improved sanitation in the overall population and in each quintile were interpolated by the IDW and EBK based on all clusters that included at least one household in that quintile. The root mean squared error (RMSE) was calculated by omitting a measured value then predicting its value based on surrounding point values. The measured and predicted values were compared through an iterative cross validation process within each model to compare the level of accuracy different models predicted as values at unobserved sites. The IDW and EBK interpolation models for the overall population and for each quintile were both adjusted to obtain a minimal RMSE, and the ones with the smaller RMSE, either IDW or EBK, were selected as final models.

Estimating the Population with Improved Sanitation
Both the continuous raw and spatially-smoothed coverage rates of improved sanitation for the overall population, interpolated by final models, were re-sampled to spatially match the actual population distribution in AfriPop, and also adjusted separately in urban and rural areas for temporally matching the 2010 AfriPop. Adjustment rates between 2008-2010 for urban and rural areas were determined by linear regression, established based on multiple surveys during 1989-2008, including DHS (1989DHS ( , 1993DHS ( , 1998DHS ( , 2003 [36]. Due to a lack of city boundaries, the urban areas were extracted under the assumption that the population density is greater in the city than outside. According to the World Bank estimates, 24% of the total population in Kenya lived in urban areas in 2010 (23% in 2009, and also 24% in 2011 and 2012). All grid cells were ranked in order from highest to lowest population density. Starting with the highest population density, all cells including 24% of the total population were extracted as urban areas in this study, and the remaining areas were marked as rural. The total number of the population covered by improved sanitation was the sum of urban and rural estimates, separately calculated by adding together all grid cell values by time-adjusted coverage rates and population.
Additionally, the interpolated raw and spatially-smoothed coverage rates of improved sanitation for the overall population and each quintile were masked with the populated areas (population density 1 person/km 2 , namely 0.01 person/ha in the AfriPop 2010), and then averaged at the county level for a better view.

Raw Coverage
Each household was assigned to one of five quintiles, according to its national asset scores. The numbers and percentages of clusters with different households in each subset (quintile) are shown in Fig 2. When comparing the distribution of surveyed households for the five quintiles (Fig 3), we found a clear transition of wealth from periphery to the capital Nairobi. Most of the households in quintile 1 are located in the north and west (Western and Nyanza Province) of Kenya and southeastern coastline regions (Coast Province). The households in quintile 2 were primarily clustered in the west, with the clustering trend in the north and southeastern coastlines waning. In quintile 3, an obvious clustering trend emerged in the Central Province and this trend continued into quintile 4. The majority of the richest 20% households in quintile 5 were congregated in Nairobi to the south of the Central Province, and in Mombasa in the southeast, the second largest city in Kenya.
The numbers and percentages of sampled clusters with varying coverage of improved sanitation in each quintile are shown in Fig 4, where a clear rise and decline in percentage of the clusters with 80-100% and 0-20% coverage, respectively, were indicated from quintile 1 to 5. The differences in raw coverage rates among the sampled clusters between each subsequent and preceding quintile were statistically tested ( Table 2). It was obvious that the coverage rates of improved sanitation have continuously increased from the poorest to the richest quintile, with significant differences between each subsequent and preceding quintile (α 0.001).
Excess risk was mapped overall and for each quintile to illustrate the degree of potential risks for sanitation-related diseases due to low coverage (Fig 5). The clusters with highest risks relative to the national average were highlighted with larger rose and red points. The distribution of high risks was generally consistent with that of low coverage and clusters having households in the poorest quintiles. For example, quintile 1 has the highest number of red points representing clusters with extremely high excess risk (ranging from 2 to 11.4). In contrast, clusters with households in quintile 5 showed lower excess risk, with most clusters falling into the low excess risk range (0-0.5).
Local Moran's I was calculated for the overall population (Fig 6a), demonstrating the clustering of high (High-High) and low coverage (Low-Low), high surrounded by low coverage (High-Low), and low surrounded by high coverage (Low-High). The majority of clusters with  Although there were few clusters with high coverage in the north and west of Kenya, each of these clusters was surrounded by clusters with poor coverage. Clustering with low coverage was found in the Lake Victoria region and northern rural regions. The raw coverage rates of improved sanitation were interpolated based on surveyed households by EBK interpolation (Fig 7), which produced less RMSEs than IDW for the overall population and for the five quintiles ( Table 3). The general trend in coverage rates of improved sanitation has increased from quintile 1 to 5, due primarily to the elevated rates at sampled clusters. A balance for the resolution of spatial heterogeneity was found, where interpolated coverage rates were re-sampled, masked with the populated areas ( 0.01 person/ha), and averaged by county (Fig 8). Except Bungoma (58.7%) and Trans Nzoia County (53.0%) in the west, nine out of eleven counties with more than 50% coverage rates were located in the south, forming a continuous band from Nyandarua (54.9%) to Kwale (52.7%), going through Muranga (52.1%), Kiambu (66.0%), Nairobi (94.0%), Machakos (52.1%), Kajiado (59.9%), Taita Taveta (63.2%), and Mombasa (60.5%). The coverage rates in the south, decreasing from Nairobi in all directions, were generally higher than in the north and east, while Nyanza and North Eastern Province had relatively poor coverage.

Spatially-Smoothed Coverage
Similarly, a spatially-smoothed coverage rate was calculated for each cluster from a collection of its neighboring clusters to overcome the small number issue that may happen in this study, since 26.2%-45.7% of clusters in separate quintile subsets included only three households or less (Fig 2). The spatially-smoothed coverage rates had stronger correlation with the predicted coverage rates, which were produced by omitting a point and calculating the rate of this   location based on the remaining points. The numbers and percentages of clusters with different smoothed coverage in each quintile are shown in Fig 9. The percentages of clusters with high coverage in quintile 1 and 2, and of the clusters with low coverage in quintile 4 and 5 declined significantly, which resulted from spatial smoothing that adjusted the raw coverage rates of clusters in each quintile towards a level suitable for the average economic status at that level. Quintile 3 was a middle-level status, so the raw coverage rates in clusters at two ends were adjusted towards the middle with the percentages of the clusters with 0-20% and >80-100% coverage dwindling and of the clusters with >20-80% coverage growing. The spatially-smoothed coverage rates interpolated on the basis of sampled clusters were shown in Fig 10, where EBK remained superior to IDW. Likewise, the continuous smoothed rates were re-sampled, masked with populated areas (population density 1 person/km 2 ), and averaged by county (Fig 11). The direction (increase or decrease) and degree of spatial smoothing in all counties, measured by subtracting the raw from spatially-smoothed coverage rates, were categorized (Fig 12). In general, the spatially-smoothed coverage rates decreased in the High-Resolution Mapping and Estimation of Improved Sanitation in Kenya south (surrounding Nairobi), and increased in the north and west (surrounding Lake Victoria), with the degree of being spatially smoothed varying by wealth and across Kenya. In the subsets, the rates in quintile 1 and 5 were most intensively under-and over-estimated, respectively. The degree of spatial smoothing in each quintile was examined against the numbers of surveyed households within clusters. A consistent trend in all five quintiles indicated that the coverage in clusters including fewer households was more drastically changed by spatial smoothing.
Local Moran's I was also calculated for the overall population after spatial smoothing (Fig  6b). The general clustering trend of hot and cold spots among sampled clusters was confirmed after smoothing, although clustering of spatial outliers was eliminated.

Estimated Population with Improved Sanitation
Among the surveyed clusters by the 2008-2009 KDHS, mean coverage rates were 82.1% and 35.1% in urban and rural regions, respectively, while the median rates were 91.5% and 20.8% in urban and rural areas, respectively. According to the linear regression, the estimated urban coverage rates in 2008 and 2010 were 78% and 79.2%, respectively, while the estimated rural coverage rates in 2008 and 2010 were 50% and 50.5%, respectively. Therefore, growth rates of 1.2% and 0.5% were added to the interpolated coverage rates in urban and rural grid cells, respectively, under the assumption that the coverage rate has increased by the same extent nationwide during 2008-2010. Adding together all products by time-adjusted coverage rates and total population, the estimated total population with improved sanitation in 2010, based on raw and spatially-smoothed rates, were 19,875,164 (6,569,164 in urban and 13,306,000 in rural regions) and 18,978,948 (6,084,058 in urban and 12,894,890 in rural areas), respectively. When divided by the total population, the estimated overall coverage rates, before and after spatial smoothing, were 49.1% (67% in urban and 43.5% in rural regions) and 46.9% (62.1% in urban and 42% in rural areas), respectively.
According to annual updates of the Progress on Drinking Water and Sanitation, nationwide estimates of coverage rates of improved sanitation (improved + shared) in 2008 were 78% and 50% for urban and rural Kenya, respectively [22]. According to the 2012 update report, estimates for urban and rural Kenya in 2010 were 80% and 53%, respectively [25]. However, estimates for urban areas in Kenya in 2011 [14] and 2012 [24] were only 78% and 79%, respectively, while estimates for rural Kenya were 48% in both 2011 and 2012. It implies that

Discussion
Declared as the International Year of Sanitation, the year 2008 serves as an important baseline for assessing the progress towards the Millennium Development Goal (MDG) target to reduce by half the proportion of the 2.6 billion people without access to basic sanitation by 2015. Therefore, a basic understanding of the status quo of the sanitation in the baseline year is a necessary requirement for future strategies. In addition to several recent and encouraging studies in Kenya, the availability of data sources in proximate years (improved sanitation coverage This is the first study to comprehensively analyze the spatial distribution of improved sanitation coverage among Kenyan households by their socioeconomic status at the county and even finer levels. Among all surveyed households, we found that wealthier populations were located in central regions in or near Nairobi and poorer populations were dispersed outwards from Nairobi in all directions, particularly in the northwest and northeast, and Lake Victoria region. The patterns of high coverage of improved sanitation were generally consistent with locations where wealthier populations lived. Wealthier populations were often collocated in areas with the most improved sanitation services and poorer populations were often collocated in areas with less effective sanitation services. Excess risk and Local Moran's I showed the relationship between sanitation nationally and across socioeconomic status using different parameters. The former showed the extent of potential risk in surveyed clusters relative to the national average of households without sanitation, independent of surrounding clusters. The latter showed the relationship of improved sanitation coverage for each cluster to neighboring clusters. Excess risk in each quintile indicated that a gap of coverage existed among households with comparable socioeconomic status. Clusters with the poorest households (quintile 1) had more than twice the excess risk relative to the national average in both clusters around urban areas of Nairobi and Kisumu as well as in urban areas in northern Kenya. Households in quintile 3 generally had lower risk for lack of improved sanitation than the poorest households, but still show significant excess risk, especially around urban areas. Significantly higher risks appeared in the west and middle regions in quintile 4, possibly as the clusters had greater wealth heterogeneity, with richer households located near middle and poorer households. Additionally, a minor clustering of low coverage in quintile 4 occurred in the Lake Victoria region.
Most clusters fell into two categories, namely hot spots and cold spots, indicating that areas of high and low coverage are often surrounded by neighboring clusters with similar coverage rates of improved sanitation. This is important in deciding areas of highest priority for risk for sanitation-related diseases. In contrast, high coverage clusters were infrequently surrounded by low coverage clusters and such a relationship was dispersed across Kenya. Local Moran's I using spatial smoothing coverage likewise considered not only the coverage in each cluster but also in their neighborhoods, to adjust outliers to a local average level. Most of the noticeably adjusted clusters had inconsistent raw rates within neighborhoods or small numbers of households in quintile subsets. For example, rates in most clusters with less than three households in all quintiles were adjusted, but in most clusters with more than ten households the rates remained or were only slightly adjusted. In addition to adjusting possible outliers, spatial smoothing also more clearly showed the same trend resulting from raw coverage estimates. Further surveys and analytical methods are needed to assess the accuracy of spatial smoothing used to adjust inconsistent rates, especially in regions with large differences in coverage before and after smoothing.
The IDW interpolation, widely used in many socioeconomic, disease, and health studies [37][38][39][40], was used for comparison purposes. Since the numbers of households in clusters varied considerably in quintile subsets, from 1 to 25 depending on the national asset scores of households, we used the numbers of households to weight the clusters for the IDW interpolation under the assumption that the rates in clusters with more surveyed households were more reliable. However, the RMSE from the IDW was higher compared to the EBK interpolation in all five subsets and the overall dataset, which demonstrated the strengths of the EBK for small datasets over not only other kriging methods, but also the deterministic interpolation approach.
A primary concern was the representativeness of surveyed clusters. The low coverage was partly attributed to rural clusters across Kenya. Fewer clusters were sampled in the northern and eastern regions than in the central and western regions. Spatial patterns of coverage in wealth quintiles were in part attributed to the sampling strategy. For instance, most of the surveyed households in the north were located in rural areas with very low population density, while all surveyed households in Nairobi and Mombasa were in urban areas. Furthermore, all households in each of 397 clusters were sparsely sampled instead of completely surveyed, subjecting them to some level of bias.
There is an ongoing debate about whether shared facilities are able to provide a similar level of protection as private ones [17]. In this study, we included the households with all types of improved sanitation facilities regardless of their status as shared or private. When averaging the coverage rates by county, all populated areas within each county were given the same weight under the assumption that the population density was homogeneous across each county. Additional approaches should be explored to seek more fairly representative values of the average level in each county. Additionally, spatial smoothing may misinterpret some truly abnormal values as outliers, which would introduce new sources of error and mitigate the degree of true spatial heterogeneity.
The estimates from our approach, even prior to spatial smoothing, are apparently lower than the JMP estimates in both urban and rural areas, most likely due to the random shift of the original location of the surveyed urban clusters away from the actual urban areas. This resulted in the urban clusters not matching the underlying urban grid cells in AfriPop. Therefore, when calculating the total urban population with improved sanitation, the coverage rate was most likely multiplied by the underlying rural grid cells with relatively low population density, instead of the correct urban grid cells with high population density. This greatly reduced the size of covered population in urban areas. In contrast to urban regions, rural regions possess relatively large areas, which made estimates of the rural coverage rates less subjective to the mismatch resulting from the random shift in the original data. Another limitation of our estimation approach may lie in the assumption that the population density is greater in the city than outside the city when determining urban boundaries. Some rural areas may be misclassified as urban areas due to their high population density. However, using an only cut-off value of population density to extract urban boundaries is considered more arbitrary by comparison, as the cut-off value may vary over time and across countries. We thus took advantage of the strengths of the high-resolution AfriPop population grids, which captured the clustering of the population at a finer resolution.
This study suggests that interpolating and time-adjusting coverage rates in sampled clusters in DHS, matching with and multiplying by AfriPop population grids, is a feasible and acceptable approach to estimate the size of the population with improved sanitation and their spatial distribution, especially for the rural areas. Due to the limited urban areas in low-and middleincome countries, the estimates in the urban areas need to be explained with caution. This study adds to the growing body of literature highlighting the importance of improved sanitation, demonstrating spatial evidence for future intervention activities to be tailored for both different wealth categories and nationally where there are areas of greater needs when resources are limited.