Differences in levels of E. coli contamination of point of use drinking water in Bangladesh

This study aimed to quantify the inequalities and identify the associated factors of the UN sustainable development goal (SDG) targets in relation to safe drinking water. The concentration of the gut bacterium Escherichia coli in drinking water at the point of use (POU) and other information were extracted from the latest wave of the nationally representative Bangladesh Multiple Indicator Cluster Survey (MICS 2019). Bivariate and multivariable multinomial logistic regression models were used to identify potential predictors of contamination, whereas, classification trees were used to determine specific combinations of background characteristics with significantly higher rates of contamination. A higher risk of contamination from drinking water was observed for households categorized as middle or low wealth who collected water from sources with higher concentrations of E. coli. Treatment of drinking water significantly reduced the risk of higher levels of contamination, whereas owning a pet was significantly associated with recontamination. Regional differences in the concentrations of E. coli present in drinking water were also observed. Interventions in relation to water sources should emphasize reducing the level of E. coli contamination. Our results may help in developing effective policies for reducing diarrheal diseases by reducing water contamination risks.


Unfunded studies
Enter: The author(s) received no specific funding for this work. The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article. NO

Abstract
The aim of this research is to provide empirical evidence to monitor the progress, to quantify the inequalities and to identify associated factors of the sustainable development goal (SDG) targets in relation to the safe drinking water. The level of E-Coli concentration in drinking water at the point of use (POU) and other information were extracted from the latest wave of the nationally representative Bangladesh Multiple Indicator Cluster Survey (MICS: 2019) data.
The bivariate and multivariable multinomial logistic regression models and classification tree are used to find the specific combinations of the background characteristics with significantly higher rates of contamination.
A higher contamination risk of drinking water was observed if the household was categorised as middle or low wealth category and collects water from source with higher level of E-Coli concentration. Treatment of drinking water significantly reduces the risk of higher level of contamination whereas; having a pet was significantly associated with secondary contamination. Regional inequalities in the presence of E-Coli concentration in drinking water was evident.

Introduction
The higher mortality rates from diarrheal diseases (in 2017, 1.6 million deaths globally including half a million children), predominantly from low-and middle-income countries can be substantially reduced through the interventions of safe drinking water 1 . Consuming safe drinking water is a human right 2 as reflected in the Millennium Development Goals (MDG) as 'access to improved source' 3 and in the SDG as 'safely managed' 4,5 targets. Contemporary studies showed that drinking water at the point of use (POU) is more likely to be contaminated than that at the main source and hence, the SDG targets beyond the infrastructure of source is more practical and challenging 6,7 . The quality of drinking water in relation to health issues may be assessed through the presence or concentration of microbial contamination 8,9 .
The SDG target of 'safely managed' water, in terms of free from microbial contamination, can be assessed using the presence or concentration of E-Coli (or other) microbes in drinking water.
Based on the concentration of contamination, drinking water at source or at the POU are categorized with respect to potential health risks 10,11 . To monitor the achievements of safely managed drinking water in relation to the SDG targets and to develop related policy recommendations, contemporary literature aimed at determining the associated factors of E-Coli concentration in drinking water. One of the major sources of contamination is the collection point where drinking water may have exposed to germs from unhygienic environment. The sources are categorized as 'improved' or 'unimproved' based on their ability to supply safe drinking water through their construction or treatment 12 . Water collected from an 'unimproved' source is more likely to be contaminated during extraction, which may be carried on at the POU drinking water 10,[13][14][15] . Type of main drinking water source and level of contamination at the collection point have an association with the level of contamination at the POU, though many of the improved sources may have faecal contamination above the WHO standard 16,17 . A higher level of faecal contamination in household water is associated with unimproved sanitation facilities 18 . If households use open defecation or unimproved facilities, the germs from excreta can be exposed to water source, and consequently, quality of drinking water deteriorates. Secondary contamination of drinking water occurs due to poor management of household water resources, such as, unhygienic storage practice or water related behaviours which is not uncommon in developing countries 19,20 . A clear understanding of the health impacts of consuming unsafe drinking water and safe handling practices to maintain quality of potable water can be achieved by passing information to individuals through educational institutions and/or community initiatives. A positive association between hygienic water practice and level of educational attainment is reported in the contemporary literature 21 .
Increased risk of higher level of contamination of drinking-water at the point of consumption is significantly associated with ownership of livestock 22 . For example, a study with data from Ghana, Nepal and Bangladesh shows that ownership of any type of livestock was associated with an increased risk of faecal contamination of drinking water at the point of use 23,24 .
Adequate water treatment methods at the point of use can significantly reduce the presence and total count of coliforms existed in the drinking water [25][26][27] . Socio-economic inequality among respondents was reflected in the access to quality livelihood, especially in terms of accessing potable water. A higher asset index score measured through household possessions was significantly associated with access to improved water sources and reduced E-coli contamination in the drinking water 28,29 .
In literature, the general practice is to convert the E-Coli concentration of drinking water into the categories of potential health risks. The standard procedure of measuring the association of categorical outcome variable with a set of covariates is to measure the adjusted odds ratio in a multivariable analysis 30 . Based on the number of categories, a binary or trichotomous logistic regression models are fitted 10,31 . For three levels of ordered outcome variable measured as low, medium and high risks, an ordered logit model is preferred. The spatial correlation in the distribution of the outcome variable is often incorporated using the Bayesian multivariable modelling framework 7 .
This research aimed at understanding the distribution and inequalities in the E-Coli concentration in the point of use of drinking water at the households of Bangladesh. To accomplish the aim, association between E-Coli concentration and a set of covariates were measured. The unadjusted and adjusted effects were estimated using bivariate and multivariable analyses. A machine-learning tool, classification tree is used to identify the distribution of E-coli concentration over interactions of predictor variables. The analyses were conducted using the latest data of the nationally representative Multiple Indicator Cluster Survey (MICS) of Bangladesh conducted in 2019. The research has implications to better understand the SDG target 6.1 and to provide empirical evidence to support the development of feasible and effective plans to reach the target.

Data
This study was conducted with the latest wave of the Multiple Indicator Cluster Survey conducted in Bangladesh in 2019 (MICS: 2019). The survey was designed to achieve reliable estimates at national level, across urban-rural areas, administrative divisions, and across districts of Bangladesh. A two-stage, stratified cluster sampling technique was adopted for the purpose of survey implementation. The first stage sampling frame was the primary sampling units (PSUs) obtained as the enumeration areas (EAs) based on the latest Bangladesh Population and Housing Census-2011. The main strata were defined as the urban and rural locations within each of the 64 districts. A probability proportional to size (PPS) sampling procedure was used to select the PSUs (3220) from each of the sampling strata. For each of the selected EAs, complete list of households was prepared for the next stage sampling. A systematic random sample of 20 households was drawn from each of the 3220 EAs selected in first stage. From the selected 20 households from each EA, 4 households were selected for assessing arsenic concentration in drinking water. From the four selected households of each of the EAs, two households were randomly sampled for assessing E-coli content in household drinking water and at the 'source' of drinking water. The sampling in these two stages were done using a systematic random sampling technique. Thus, the expected sample size of this study was 6,440 households those were selected for testing of E-coli. A 6,069 (98.7%) of the selected households were successfully tested for both household and source water quality for E-coli. Seventeen cases, for which results were lost or not readable, were excluded from the study.

Outcome variable
The dependent variable in this study is the quality of drinking water in terms of possible faecal contamination. The water samples were collected from the household by asking for 'a glass of water that you would give a child to drink'. The most recommended indicator for faecal contamination of drinking water in Bangladesh is the number of bacteria species Escherichia coli (E-coli) in a 100 ml sample of drinking water. The number of blue colonies as a measure of E-coli colony forming units (cfu) was recorded by MICS teams in the field. For this purpose, 100 ml of sample water is filtered through a 0.45 micron filter (Millipore Microfil®) and placed onto a Compact Dry EC growth media plates. After 24 hours of incubation at ambient temperature, number of blue colonies was recorded. The drinking water quality guidelines as recommended by the World Health Organisation (WHO) was followed to categorise recorded incubation into different risks levels. Household drinking water with less than one blue colony is termed as 'low risk', whereas, those within 1 to10 colonies were categorised as 'medium risk', and samples with 11 to 100 colonies and more than 100 colonies were categorised as 'high' and 'very high' risk, respectively. In this research, the last two categories are combined into the 'high risk category'.

Predictor variables
In this study, a set of predictor variables were considered to test possible association with the outcome variable. The choice of the predictor variables was guided by the existing literature, knowledge of the researchers and availability of information. The E-Coli contamination in the drinking water may be carried through the source of water collection. The information of this variable is recorded and categorized in the same way as the outcome variable. The type of drinking water sources (categorized as improved and unimproved) is considered as a potential predictor variable. The location of drinking water source may be linked to water quality in two ways, as they are located at areas surrounded by cleaner environment or may be contaminated through the carrying process. Based on the locations of drinking water sources, households were categorized as those having the sources in Dwelling/ Premises or elsewhere. The other predictor variables included into the analysis are whether the water was treated or untreated, type of toilet facility (improved and unimproved), place of residence and administrative division. This research tested the hypothesis that educational attainment of household head influences the behavior of the household members towards consuming safe drinking water.
Based on the educational attainment, household head were categorized as no education or preprimary, primary and secondary level of education, or higher level of education. Several studies observed a positive association between ownership of livestock and contamination of water.
With the ownership status of any livestock, herds, other farm animals, or poultry, households were categorized as either own or do not own any of the livestock. The study also tested whether the wealthy households, with better management, were able to keep the contamination to a lower level. Based on the household have higher abilities to manage safe drinking water, the variable was categorized as poor, middle or rich households.

Statistical Analysis
In specifying the determinants of E-Coli contamination in drinking water, the bivariate effects of the selected characteristics were examined. As the outcome variable as well as all covariates are categorical by nature, bivariate 2 analyses were carried out to identify the set of covariates with significant impact on the level of E-Coli contamination. However, bivariate association between two variables does not necessarily imply a significant relationship as this measure is not adjusted for other covariates. Therefore, a multivariable approach was applied to determine the adjusted degrees of association between the covariates and outcome variable. The variable, E-coli contamination in household drinking water has three level and coded 2 (for high is the set of regression coefficients associated with outcome 1, is the set of regression coefficients associated with outcome 2; and X is the vector of explanatory variables, that is X=( 1 , 2 , … , ) where k is the number of explanatory variables.
In order to identify significant multifactor interactions of the covariates associated with the level of E-coli contamination, a classification tree method was used. The methodology is guided by the conditional inference framework 37,38 . The squared adjusted generalized variance inflation factor (GVIF) scores were used to quantify multicollinearity in the model 39 . The utilized to analyze data to fit models.

Results
The results from bivariate analysis presenting the relationships between E-Coli contamination in drinking water and potential covariates are presented in Error! Reference source not found.. A higher level of contamination in source of drinking water resulted in a higher level of contamination at the point of use and the association was statistically significant (p < 0.001).
Type of source of drinking water and location of the facility were significantly associated with the level of e-coli contamination at the POU. E-Coli contamination was significantly higher in water from unimproved sources than improved sources. The contamination was significantly lower in water collected from sources located in the household or premises than those located outside. The results from bivariate analysis also indicated that the proportion of households with e-coli contamination was lower for those using any water treatment methods. Percentages The multivariable analysis using a trichotomous logistic regression was employed to quantify adjusted impacts of covariates on the E-coli contamination on drinking water with three levels.
All the significant variables in the bivariate analysis were included in the model. Because of possible multi-collinearity of source and location of drinking water, the location of drinking water source is excluded from the model. The model outputs along with adjusted odds ratios (AOR) and 95% confidence intervals (CI) are presented in Table 2.
In the trichotomous logistic regression model, a higher level of E-coli concentration on the source of drinking water was significantly associated with the higher level of E-Coli  The awareness programs should include the risks of secondary contamination by cattle or through improper storage systems. Significant regional inequalities in microbial contamination demands the adaptation of alternative approaches at local level. Bangladesh must ensure safe drinking water source to the population in order to make further inroads towards reducing mortality relating to diarrheal diseases and improve overall contamination risks.

Conclusion
Like other low-and middle-income countries, microbial contamination of drinking water is a major public health concern in Bangladesh. Using multivariable statistical models and machine learning tools, to the latest nationally representative survey in Bangladesh, this study identified associated factors of E-Coli concentration at the POU drinking water.
Despite nearly universal access of 'improved' sources, E-Coli concentration at the POU drinking water was identified as a major public health concern in Bangladesh. This contamination was significantly associated with the contamination of water at the source but not with the type (improved or unimproved) of the source. Hence, key causes of contamination of water at the source of collection should be identified and measures should be taken to avoid such contamination. Secondary contamination of drinking water was significantly associated with the ownership of livestock. The rural households should be educated regarding the possible secondary contamination of drinking water through livestock. Use of water treatment facilities significantly reduces the e-coli contamination of drinking water, though the use is limited. Integrated campaign regarding the importance of treating water before drinking may excel the current rates of water treatment users. The potential water treatment users should also educate how to use the methods effectively to make water safe and the materials should be readily available. Ethical approval: Ethical approval for the analyses was not sought as the paper is based on deidentified data.

Declaration of interest:
This research does not contain any conflict of interest.    Table 2: Percentage distribution of households with various level of E-coli contaminations in drinking water at the point of use. Table 2: Adjusted odds ratio, confidence interval and p-value of moderate and high risk of Ecoli contamination of drinking water at the point of use obtained from the multinomial logistic regression models.
List of Figures: Figure 2 Classification tree representing the distribution of the level of E-Coli concentration of drinking water over the combinations of the levels of the household characteristics.