On the Identification of Associations between Five World Health Organization Water, Sanitation and Hygiene Phenotypes and Six Predictors in Low and Middle-Income Countries

Background According to the most recent estimates, 842,000 deaths in low- to middle-income countries were attributable to inadequate water, sanitation and hygiene in 2012. Despite billions of dollars and decades of effort, we still lack a sound understanding of which kinds of WASH interventions are most effective in improving public health outcomes, and an important corollary–whether the right things are being measured. The World Health Organization (WHO) has made a concerted effort to compile comprehensive data on drinking water quality and sanitation in the developing world. A recent 2014 report provides information on three phenotypes (responses): Unsafe Water Deaths, Unsafe Sanitation Deaths, Unsafe Hygiene Deaths; two grouped phenotypes: Unsafe Water and Sanitation Deaths and Unsafe Water, Sanitation and Hygiene Deaths; and six explanatory variables (predictors): Improved Sanitation, Unimproved Water Source, Piped Water To Premises, Other Improved Water Source, Filtered and Bottled Water in the Household and Handwashing. Methods and Findings Regression analyses were performed to identify statistically significant associations between these mortality responses and predictors. Good fitted-model performance required: (1) the use of population-normalized death fractions as opposed to number of deaths; (2) transformed response (logit or power); and (3) square-root predictor transformation. Given the complexity and heterogeneity of the relationships and countries being studied, these models exhibited remarkable performance and explained, for example, about 85% of the observed variance in population-normalized Unsafe Sanitation Death fraction, with a high F-statistic and highly statistically significant predictor p-values. Similar performance was found for all other responses, which was an unexpected result (the expected associations between responses and predictors–i.e., water-related with water-related, etc. did not occur). The set of statistically significant predictors remains the same across all responses. That is, Unsafe Water Source (UWS), Improved Sanitation (IS) and Filtered and Bottled Water in the Household (FBH) were the only statistically significant predictors whether the response was Unsafe Sanitation Death Fraction, Unsafe Hygiene Death Fraction or Unsafe Water Death Fraction. Moreover, the fraction of variance explained for all fitted models remained relatively high (adjusted R2 ranges from 0.7605 to 0.8533). We find that two of the statistically significant predictors–Improved Sanitation and Unimproved Water Sources–are particularly influential. We also find that some predictors (Piped Water to Premises, Other Improved Water Sources) have very little explanatory power for predicting mortality and one (Other Improved Water Sources) has a counterintuitive effect on response (Unsafe Sanitary Death Fraction increases with increases in OIWS) and one predictor (Hand Washing) to have essentially no explanatory usefulness. Conclusions Our results suggest that a higher priority may need to be given to improved sanitation than has been the case. Nevertheless, while our focus in this paper is mortality, morbidity is a staggering consequence of inadequate water, sanitation and hygiene, and lower impact on mortality may not mean a similarly low impact on morbidity. More specifically, those predictors that we found uninfluential for predicting mortality-related responses may indeed be important when morbidity is the response.


Introduction
According to the most recent estimates, 842,000 deaths in low-to middle-income countries were attributable to inadequate water, sanitation and hygiene in 2012 [1]. This figure represented 58% of total deaths attributed to diarrheal disease which, in turn, constituted an estimated 1.5% of the total Global Burden of Disease (GBD) [1,2]. It is a notable reduction from the estimated 88% of total deaths attributed to diarrheal disease related to inadequate WASH in 2000. Diarrheal deaths as a whole fell from an estimated 2.2 million in 2000 to 1.5 million in 2012 [1][2][3][4].
UNICEF and WHO in a report on progress towards meeting Millenium Development Goals (MDGs) state that the MDG for drinking water-a 50% reduction in the number of people without sustainable access to safe drinking water-was achieved by 2010, five years ahead of schedule. According to their figures, 91% of the world's population now has access to improved drinking water [5]. Note that the term 'improved' does not necessarily mean 'safe' according to WHO standards [4,6,7].The UNICEF/WHO report places particular emphasis on the category of piped water to premises, which it describes as the highest level of service in drinking water supply. The report goes on to acknowledge that the MDG for improvement in sanitation-halving the number of people without access to basic sanitation-has been missed by some 700 million people [5].
There has been heartening progress. Nevertheless, it is plain that much remains to be done [8,9]. The way forward, however, is still unclear. There are three principal difficulties. First, the data we have available is of uncertain reliability, coverage and comparability across countries. This is true both for the data on mortality and on the presence or absence of different categories of WASH. Many of the available numbers may be considerably out of date. For example, 21 sub-Saharan African countries conducted no household surveys in the years 2006-2013. It is likely that the most vulnerable are underrepresented, including people living in urban slums or in conflict zones [7,10,11]. Second, the data is unusually heterogeneous, deriving from a wide range of sources including censuses, national registers and household surveys. This adds to the difficulty of interpreting statistical results. Third, despite decades and billions of dollars of effort, our understanding of which kinds of WASH interventions are most effective in improving public health outcomes is still weak. The lack of adequate randomized controlled trials of different kinds of intervention has been widely commented upon [12,13].
The World Health Organization (WHO) has made a concerted effort to compile the most comprehensive and reliable collection of information to date [1]. This data, along with analyses of the GBD data and with meta-analyses of prior studies, has been closely examined in a noteworthy series of papers published in the Journal of Tropical Medicine and International Health in 2014 [2,3,7,[14][15][16].
Clasen et al. describe the evolution of the GBD, including changes in methods, definitions and scope. They also describe an alternative approach to understanding the impact of WASH interventions, using population intervention modeling [14]. Prüss-Ustün et al. [2] estimate the burden of diarrheal disease that can be attributed to exposure to inadequate water, sanitation and hygiene based on exposure data and a related exposure-risk relationship. This is where the figures cited above were obtained. They further decompose the global estimate of 842,000 deaths into 502,000 deaths related to inadequate drinking water, 280,000 deaths related to inadequate sanitation and 267,000 deaths related to inadequate hand hygiene. For children under five, they estimate 361,000 preventable deaths due to WASH-related diarrhea, or 5.5% of deaths in that age group.
Freeman et al. [15] provide a meta-analysis of hand washing studies. They estimate that adequate hand washing may reduce the risk of diarrheal disease by 40%. However, when they adjusted for unblinded intervention studies, this figure declined to 23%. They also suggest that only 19% of the world's population practices adequate hand hygiene [15]. Bain et al. estimate exposure to fecal contamination in different kinds of drinking water source based on household surveys and censuses. They combined this with a meta-analysis of 345 water quality studies. They estimate that 1.8 billion people worldwide drink water from contaminated sources, with at least 1.1 billion exposed to at least moderate risk (> 10 E. coli or thermotolerant coliform per 100 ml). They found that an estimated 10% of "improved" water sources may be high risk with at least 100 E. coli or thermotolerant coliform per 100 ml. Significantly, this category includes piped water supplies. They found also that people living in rural areas and people living in Africa and Southeast Asia were at higher risk from contaminated drinking water sources. They suggest that the sizeable improvements in mortality related to WASH and diarrheal disease in the GBDs may overstate the actual gains [3,4,6,17]. Wolf et al. offer a comprehensive meta-analysis of studies investigating the effect of drinking water and sanitation improvements. The studies range in date from 1970 to 2013 and include randomized controlled trials, quasi-randomized trials and different kinds of observational study. They show that improvements in water supply and sanitation reduce the risk of diarrheal disease. Greater impacts were associated with filtering, high quality piped water and sewerage connection [16].
In a later meta-analysis, Clasen et al. [12] examined the health impacts of different kinds of water supply interventions, including Point of Use (POU) technologies. The latter are increasingly seen as a viable interim measure in very poor and/or rural areas where the timely implementation of well-managed piped water systems is highly unlikely [18]. The authors found little evidence to support the idea that improvements at the source (e.g., protected wells, standpipes) had any significant impact on the burden of diarrhea. They were cautiously optimistic about POUs. This study draws particular attention to the fragility of much of the data that is available to assess WASH interventions.
These are valuable findings. They confirm that inadequate WASH infrastructure is related to negative public health outcomes and improvements in these outcomes vary according to different types of interventions. The authors, nevertheless, offer a number of caveats concerning our ability to derive policy-relevant inferences from the data and urge efforts at better data collection. Empirical evidence concerning the effects of a range of interventions is also heterogeneous and fragmentary. Decades of efforts in the field have not resolved recurrent, policyrelevant questions such as whether improvements in water supply are effective in the absence of improved sanitation or in the absence of behavioral changes such as hand washing [12,[19][20][21][22][23][24][25][26]. A better sense of the relative importance and interactions of predictor variables is sorely needed to guide investment strategies in WASH interventions.
This paper reports on an analysis of the latest WHO data that takes a somewhat different approach than has been used to date. Our goals in this endeavor are two-fold: 1. To distinguish among predictors with a finer resolution to determine their relative importance;

2.
To generate hypotheses about the data itself concerning, for example, the treatment of outliers or the possible interpretations of certain predictors such as "piped water to premises" and "other improved water sources".

Data
The data used in this work are described in the 2014 WHO report: Preventing Diarrhoea Through Better Water, Sanitation and Hygiene-exposures and impacts in low-and middleincome countries [1]. It provides data on six predictors-Piped Water to Premises (PWTP); Other Improved Drinking-Water Sources (OIWS); Unimproved Drinking-Water Sources (UWS); Filtered and Bottled Water in the Household (FBH); Improved Sanitation (IS); and Handwashing (HW). The responses in our analyses all involved mortality and included Unsafe Sanitation Deaths (USD) Unsafe Hygiene Deaths (UHD), Unsafe Water Deaths (UWD), Unsafe Water and Sanitation Deaths (UWSD) and Unsafe Water, Sanitation and Hygiene Deaths (UWSHD). These responses are highly correlated (see §B in S1 File). We begin by focusing on one of the responses in the data-Unsafe Sanitation Deaths-in order to illustrate the steps needed to create good-fitting models, but then broaden the analyses by reporting selected results for all responses. Complete results for all responses can be generated using the input data and R scripts described in §C in S1 File and provided in S1 Data.

Statistical Analysis
The statistical analysis method used in this study was ordinary least squares regression. General diagnostic tests to help determine the legitimacy of using linear regression models include Q-Q plots and Shapiro-Wilk for normality of residuals, component plus residual plots to check for linearity between responses and predictors, Durbin-Watson to detect autocorrelation of residuals, Breusch-Pagan for constant error variance, Variance Inflation Factor for predictor correlation, and Cook's Distance and leverage plots for outlier detection [27]. Following Tukey and Mosteller's bulging rule [27], logit and power response transformations were attempted as was square root predictor transformation [28]. Model predictive performance was assessed using kfold cross validation. The procedure involves randomly partitioning the data into k groups (we used 10, but the results are fairly insensitive to k), fitting an OLS regression model using k-1 of the groups, then validating on the remaining group and computing mean square prediction error. The process is repeated many times to stabilize randomization effects-we used 1,000 as implemented in the R package DAAG [29]. We emphasize the importance of diagnostic tests and data transformations. Without careful attention paid to linear model justification, situations like those described in Bartram et al. [7] can render OLS-based results misleading and inappropriate. The issue at hand here is not inadequacy of linear models, but rather performing the transformations needed to apply them and assess their validity (e.g., see [27] chapter 4).
Significance levels for all results were adjusted to control for family-wise error rate following Bonferroni and the Holm-Bonferroni (H-B) step-down procedure [30] and for false discovery rate FDR following Benjamini and Hochberg (B-H) [31,32]. We choose to report three methods reflecting different objectives in adjusting for multiple comparisons: (1) FWER for controlling the probability of one or more type I errors by adjusting the rejection criteria of each of the individual hypotheses or comparisons, with Holm-Bonferroni representing a more powerful but less conservative variant of the traditional Bonferroni adjustment; and (2) FDR because it has improved control over the number of rejected hypotheses than Family Wise Error Rate methods (e.g., Bonferroni-Holm). FDR-controlling procedures have greater power but at the cost of increased rates of Type I errors [33].

Basic Descriptors
Descriptive statistics of the data were generated including untransformed predictor and response correlations, pairs plots, kernel densities of raw responses, kernel densities of predictors and responses scaled by population (see §B in S1 File). The responses are very highly correlated. The highest negative correlations exist between PWTP and OIWS, PWTP and UWS, and, IS and UWS. The highest positive correlation is for IS and PWTP. All response and predictor densities are, unsurprisingly, non-normal.

Initial Regression Models
The first regression models included the full complement of six WHO predictors (PWTP, OIWS, UWS, FBH, IS, HW) with unsafe sanitation deaths (USD) as response. This model yielded poor performance, as did models for all responses. In the case of USD, it explained less than 2 percent of response variance (adjusted R 2 = 0.01842) with no statistically significant coefficient estimates as shown in Table 1.

Scaled Response
We then scaled response by country population, which had sizable effects on both linear regression results and outlier detection. Fig 1 shows Cook's Distances for both scaled USD response and unscaled (inset). Unscaled, India stands apart from other countries but upon scaling by population, People's Democratic Republic of the Congo (DR Congo) emerges as a distant outlier, with Unsafe Sanitary Death Fraction (USDF) an order of magnitude greater than any other country. Scaling yielded an increase in fraction of variance explained (adjusted R 2 = 0.1165 for USD) over the unscaled response models and considerable improvements in statistical significance for PWTP, OIWS and UWS (e.g., B-H adjusted R 2 = 0.0248, 0.0248 and 0.025 respectively) as shown in Table 2. The F-statistic is highly significant (0.0007) but relatively low in magnitude (4.165). Removing People's Democratic Republic of the Congo from the data set had a pronounced effect on model fit as shown in Table 3. Adjusted R 2 increased from 0.1165 to 0.6404 and the F-statistic increased to 43.45. The unadjusted p-values for IS and FBH became statistically significant at the .05 level (0.00597 and 0.01436 respectively). But note that the statistical significance levels reported for these results must be taken in proper context (residuals are not normally distributed); these models do not satisfy the conditions needed to legitimize the use of linear models [27,34].

Transformed Scaled Response
Transforming the data to achieve models that satisfy the conditions needed to apply ordinary least squares regression dramatically altered results in some important respects. Guided by Mosteller and Tukey's bulging rule [28] we performed logit and power response transformations and removed the Democratic People's Republic of the Congo from the data set. Response transformation, however, presents a complication in that some countries in the data set reported zero deaths for some responses. We examined two ways of handling these cases: (1) replace zero deaths with one death and modify scaled responses accordingly; and (2) include only countries with nonzero reported deaths (this reduces the sample size from the full complement of 145 countries to 122 for USDF; sample sizes for the other responses range from 121 to 139; see S1 File §1. Table 4 contains the power response transformed results for case 1. Table 5 shows the results for case 2. We see very large increases in adjusted R 2 (to 0.7022, case 1 and to 0.8344 for case 2) and F-statistic (to 57.58, case 1 and to 102.6, case 3, both cases p < 2.2e-16). As summarized in S1 File §1, similar results occur for the other four responses. The sole statistically significant predictor was IS (unadjusted p = 1.06e-09 for case 1 and p = 1.26e-09 for case 2). Later results will confirm the exceptional influence of IS. Note that neither of these models meet the criteria (most notably, normality of residuals) needed to justify the use of linear models.

Predictor Transformation
The next refinement in our model building involved transforming the six explanatory variables. Guided again by the bulging rule, we selected a square-root transformation and as before, transformed response (using a maximum likelihood Box Cox transformation [35]). Table 6 and Fig 2 show representative results for response = USDF, including only nonzero response countries with DR Congo removed.
UWS, FBH and IS emerge as statistically significant, adjusted R 2 is 0.8533, F-statistic is 117.3 and as shown in Fig 2, this model satisfies the diagnostic conditions required for using a linear model. Note that the null hypotheses for the Shapiro-Wilk, Breusch-Pagan and Durbin-Watson diagnostic tests are defined such that statistical significance is indicated by p-values that exceed a critical (e.g., 0.05) threshold.
The Variable Inflation Factor results (reflecting predictor collinearity) are, however, a reason to take note, especially for PWTP (but as discussed later, we have concerns about the validity and usefulness of this variable; others have expressed similar concerns [6]). Rules of thumb regarding levels of VIF becoming high enough such that linear regression results are compromised, range from 4 to about 10, but as described in [36] these rules are difficult to apply generally and need not invalidate the results. Component-plus-residual plots for this model are shown in Fig 3 and reveal useful information regarding predictor contributions to model fit. HW in particular stands out as having very little explanatory power. IS and UWS clearly exhibit reasonably good fits, underscoring their dominant role as the most important explanatory variables.
We looked more carefully at relative variable importance by first computing Pratt importance scores shown in Table 7.
There is a pronounced break, dividing the predictors into most important (IS and UWS), somewhat important (FBH) and much less important predictors (OIWS, PWTP and HW), with the 95 percent confidence intervals of the scores for OIWS, PWTP and HW including zero. Another assessment of relative variable importance is possible through stepwise regression (using stepAIC and stepBIC). We found it notable that regardless of model performance criterion (AIC or BIC) and regardless of stepwise direction (forward, backward or two-way) the model shown in Table 8 was always identified as best. Moreover, this sub-setted model has about the same fraction of variance explained as its six-predictor parent, but has a considerably higher F statistic.
Starting with the four predictor model, we then created a series of progressively smaller models that produced the results shown in Table 9 and serve to reinforce the remarkably influential roles of UWS and IS. Another test of model performance was made through k-fold cross validation. The results provide yet further evidence of the explanatory power of IS and UWS. When all six predictors are used, the 10-fold cross validation mean square error is 0.00226; when only IS and UWS are used, the corresponding prediction error increases only slightly to 0.00238.

Categorical Response Transformation
The component-plus-residual plots shown in Fig 3 are indicative of a non-homogenous response which prompted us to reformulate response as a categorical variable comprised of USDF quartiles. We find the quartile country memberships intriguing. They are listed in Table 10.
Quartile 1, with the lowest USDF values, is comprised to a large extent of countries that are ex-socialist, Muslim, relatively oil rich, and/or recipients of large amounts of US aid. There are no sub-Saharan countries in this quartile. Quartile 4 with the highest USDF values contains many countries experiencing high levels of conflict (following [37]) and with the exception of Afghanistan, are all sub-Saharan.

Consistency of Results for Different Mortality Responses
To illustrate the steps taken to achieve good model performance, our results thus far have focused on USDF response. An interesting and unexpected result was revealed in regression results for the remaining four responses in the WHO data set (Unsafe Water Deaths, Unsafe Hygiene Deaths, Unsafe Water and Sanitation Deaths and Unsafe Water, Sanitation and Hygiene Deaths). Whichever response was examined, the regression results remain remarkably consistent as shown in Table 11. Across mortality responses, there is little change in variance explained and no change in which predictors are statistically significant at the 5% level. Similarly, fitted predictor estimates vary little with no discrepancies in sign.

Discussion
Without response and predictor transformation, with number of deaths as response and including People's Democratic Republic of the Congo (DRCongo), an OLS regression model explains essentially no observed variance and contains no statistically significant explanatory variables. Its F statistic is approximately one (and is not significant at the 0.05 level) providing further evidence that in this model, there are no significant predictors. These results occur for all modeled responses. When response is Unsafe Sanitation Death Fraction, dividing death count by country population (scaled response) we see a modest improvement in fit (R 2 = 0.1165, F = 4.165) with three predictors (IS, UWS, PWTP) emerging as significant. DRCongo is an extreme outlier and also possesses high leverage [27] such that its removal increases R 2 to 0.6404, F increases to 43.45 with a different subset of predictors becoming significant (IS and HW). Transforming response necessitates a modification to the data (zero deaths are computationally problematic) by either: (1) setting zero deaths to a small number (e.g., 1); or, (2) removing zero death countries from the analysis. We prefer approach 2 because we find zero death counts implausible. When a power transformation is performed for response USDF, R 2 increases to 0.8344 with a concomitant increase in F to 102.6 (the corresponding results for approach 1 are 0.7022 and 57.58 as shown in §A in S1 File-run012). IS becomes the sole significant predictor. Transforming predictors (square root) yields a model with slightly improved R 2 and F (0.8533, 117.3) and yet again a different set of significant predictors (IS, UWS, FBH).
We saw expected directions of association in the estimated predictor coefficients-negative for IS, FBH, HW and PWTP signifying decreases in USDF with increases in these predictors, and positive for UWS, but we also saw an unexpected and counterintuitive direction of association for OIWS (positive, USDF increases with OIWS). Concerns related to the interpretation and usefulness of OIWS as indicator are discussed in [6].
Experiments involving relative variable importance reaffirmed a now common theme that emerged in this work-IS and UWS are powerful explanatory variables, the remaining four predictors possessing far less explanatory power with PWTP and HW least useful. Regarding HW and FBH, this is perhaps not entirely unexpected given the caveat in [1]: "Data based on limited country survey data, and modelled data provided for countries without survey information. These data should therefore be interpreted with caution, and provide indicative values only." Another caveat is critically important. The results that we have presented for mortality responses plausibly do not apply when responses involve morbidity. More specifically, those predictors that we found uninfluential for predicting mortality-related responses may indeed be important when morbidity is the response. Diarrheal disease is a good example, where Sanitation and Hygiene in Low and Middle-Income Countries convincing evidence exists that hand washing for example (a predictor of essentially no use in predicting mortality) was found to be effective in reducing morbidity [14,38]. This example again brings into play the important observation that piped water to premises doesn't necessary mean safe water, with attendant consequences for morbidity and yet another outcomemalnutrition, in turn connected in a complex manner with both morbidity and mortality [14]. These results raise the following questions: (1) what are some of these predictors actually measuring? Do problems lie in their basic definitions? (2) Is the data for some of the predictors valid?; and, (3) are the responses reported in [1] truly measuring different phenotypes? The regression results reported in Table 11 (these are typical-other subsets of analyses look very similar across responses) are indifferent with respect to response-predictor associations. This seems at odds with definitions of what these responses and predictors are supposed to be measuring. We can't fully explain this result, other than to speculate that the responses themselves are broadly capturing something, but are unable to resolve more specific predictor associations. A similar argument can perhaps be made for predictor non-specificity. From a somewhat broader perspective, it seems clear that all data-predictors and responses alike-is likely to be very heterogeneous. What might that unknown heterogeneity portend for data representativeness? More specifically, we refer, for example, to the potential for conflict to corrupt or otherwise distort reported data [11]. Is the result that Unsafe Sanitation Death Fraction for the People's Democratic Republic of the Congo exceeds all other countries by an order of magnitude accurate, and if so, what is the role of decades of conflict and large-scale population movements? These may have destroyed or made inaccessible a great deal of WASH-related infrastructure along with the health care systems that might have compensated for this loss. Of course, all of this is speculation, but the numbers suggest that we need to take specific contexts into account when we address these issues.
A second area of interest concerns those predictors that did not have a significant impact on mortality outcomes. In the data, the category of "piped water to premises" is the highest level of water supply, implying that it should have a significant positive effect. That is doesn't suggests that the underlying reality is ambiguous. Piped water supplies may be intermittent. This may cause people to store water and the storage becomes the source of contamination. Or compromised distribution systems may allow contaminants to infiltrate. If people assume that piped water is clean water, they may cease to follow traditional safeguards such as boiling. In this context, research showing the degree to which improved water sources including piped water may be contaminated is particularly relevant [3,31].

Conclusions
Piped Water to Premises (PWTP) and handwashing (HW) had little value in predicting any mortality response. Good fitted model performance required: (1) the use of population-scaled death fractions as opposed to death totals; (2) transformed response (logit or power); and (3) predictor transformation (square root). The best models passed diagnostic tests for normality of residuals, linearity between predictors and response, and constant error variance, and exhibited remarkable performance given the heterogeneity of the countries involved and the complexity of the relationships between response and predictors. In the case of populationnormalized Unsafe Sanitation Death fraction as response, the model explained about 85% of the observed variance in, with a high F-statistic and highly statistically significant predictor pvalues. Two predictors-Improved Sanitation and Unimproved Water Sources-were most The fact that improved sanitation, closely followed by unimproved water source are always the most important predictors (regardless of mortality response) seems to us highly suggestive. In the absence of adequate blinded randomized controlled trials to determine the relative importance of water supply and sanitation or to understand how they interact, the strong signal in this data may provide some guidance. It suggests that efforts to provide clean water in concert with adequate sanitation will have the greatest impact and that adequate sanitation should be accorded a high priority in WASH policy.
The poor performance of other WHO predictors-specifically PWTP and HW-raises important questions. Previous research has suggested that PWTP, although it seems to imply good water quality, is not a guarantee of that [3,31]. Our results reinforce that conclusion, but, as we have suggested, its interpretation is not straightforward. HW, as a strictly behavioral variable, is particularly difficult to measure. Understandably, most of the field research in this area has been concerned with the effects of WASH interventions on health as an output. It may be that some research effort should also be directed to evaluating what these predictors are really able to tell us about WASH-related facilities as inputs.
There are many reasons to focus on the provision of clean water. Fetching water from any distance is a time-consuming burden that most often falls on women and girls. This reduces the time available for education and other productive activities. In some parts of the world, it puts the women and girls at direct physical risk. Further, fetching water over distances is likely to involve storage in the home, increasing the risk of recontamination even if the source water is reasonably clean.
If the goal is improved human health, however, our results suggest that the provision of clean water by itself is unlikely to achieve the desired outcome. Rather, the results indicate that the provision of clean water needs to be accompanied by improved sanitation in order for significant health benefits to be realized. We reiterate that while our focus in this paper is mortality, morbidity is a staggering consequence of inadequate water, sanitation and hygiene. Moreover, lower impact on mortality may not mean a similarly low impact on morbidity That adequate sanitation plays an important role in health outcomes is not a new proposition. In the literature, the relative importance of water quality and sanitation in improving health outcomes has remained an open question [2,9,16,[38][39][40][41][42][43][44][45]. Our results suggest that a higher priority may need to be given to improved sanitation than has been the case. In this analysis, we have been able to show how important it is with a much higher degree of confidence. Further, our analysis suggests that we need to examine the meaning of the predictors on which we have been relying.