## Figures

## Abstract

### Introduction

We describe a method for analyzing the within-household network dynamics of a disease transmission. We apply it to analyze the occurrences of endemic diarrheal disease in Cameroon, Central Africa based on observational, cross-sectional data available from household health surveys.

### Methods

To analyze the data, we apply formalism of the dynamic SID (susceptible-infected-diseased) process that describes the disease steady-state while adjusting for the household age-structure and environment contamination, such as water contamination. The SID transmission rates are estimated via MCMC method with the help of the so-called synthetic likelihood approach.

### Results

The SID model is fitted to a dataset on diarrhea occurrence from 63 households in Cameroon. We show that the model allows for quantification of the effects of drinking water contamination on both transmission and recovery rates for household diarrheal disease occurrence as well as for estimation of the rate of silent (unobserved) infections.

### Conclusions

The new estimation method appears capable of genuinely capturing the complex dynamics of disease transmission across various human, animal and environmental compartments at the household level. Our approach is quite general and can be used in other epidemiological settings where it is desirable to fit transmission rates using cross-sectional data.

### Software sharing

The R-scripts for carrying out the computational analysis described in the paper are available at https://github.com/cbskust/SID.

**Citation: **Woroszyło C, Choi B, Healy Profitós J, Lee J, Garabed R, Rempala GA (2018) Modeling household transmission dynamics: Application to waterborne diarrheal disease in Central Africa. PLoS ONE 13(11):
e0206418.
https://doi.org/10.1371/journal.pone.0206418

**Editor: **Iratxe Puebla,
Public Library of Science, UNITED KINGDOM

**Received: **June 17, 2017; **Accepted: **October 13, 2018; **Published: ** November 7, 2018

**Copyright: ** © 2018 Woroszyło et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability: **All relevant data are in the Supporting Information files.

**Funding: **The funders include the Mathematical Biosciences Institute at The Ohio State University through its National Science Foundation grant (NSF-DMS1440386 to GR) and the National Research Foundation of Korea grant (NRF- 2017R1D1A3B03031008 to BC). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests: ** The authors have declared that no competing interests exist.

## Introduction

Diarrhea often occurs as a symptom of an infection in the intestinal tract caused by a bacterial, viral or parasitic organism. Such infections are typically spread through drinking water, contaminated food, or from animal-to-person and from person-to-person as a result of poor hygiene [1, 2]. Most people who die from diarrheal diseases actually die from severe dehydration and fluid loss. Children who are malnourished or have impaired immunity as well as people living with HIV are most at risk of life-threatening diarrhea. Indeed, diarrhea is one of the primary killers of the young children worldwide, with an estimated 1.7 billion annual cases of diarrhea among children under 5 resulting in over 500,000 deaths, the majority occurring in low and middle income countries [3].

Although diarrheal disease is common across all economic settings, it has the most potential to cause severe consequences when resources and medical care are limited or when co-morbidities are present. Acute episodes of disease more quickly lead to dangerous dehydration, while chronic gastrointestinal infection is now thought to be linked to environmental enteric disorder (EED), which results in a chronically damaged gut, reduced immunity, and stunted growth [4]. Loss of linear growth, particularly in a child’s first years of life, can have long lasting impacts on cognitive and motor development [5]. However, consequences of disease aside, it remains unclear how differences in exposures and susceptibility play a role in the overall difference observed between children’s and adults’ diarrhea incidence.

In looking at household exposures to pathogens that may cause diarrhea, it appears that the interaction with animals (whether pets, livestock, or wildlife) plays an important role [3, 4, 6]. However, the potential of childhood animal exposure to modulate immunity and allergies [7–10] and of livestock ownership to improve nutrition and economic stability for families [4, 6] means that the specific role of household animals in transmission of diarrheal disease is complicated and needs to be clarified and better quantified with the help of a more mechanistic model.

The investigation of mechanisms behind household transmission of pathogens that cause diarrhea is not easy due to the complexity of the disease and its persistent endemic state in the global human population [3]. In general, the disease may be caused by both human-specific as well as zoonotic pathogens that have a variety of life cycles and the sheer number of potential culprits makes determining the specific cause of all observed cases on any sort of large scale practically impossible [3, 11]. In many developing countries (including most of Africa, see [12]) the problem is additionally compounded by the fact that most of the health surveillance programs operate with limited resources, and the data to assess transmission of diarrhea is generally limited to demographics, reported incidence of diarrhea, and possibly some outcome measures on households or individuals testing positive for a particular pathogen [11, 13]. Although such data may be used with the traditional mechanistic models to ascertain the role of different pathogens and transmission pathways on incidence of diarrhea [14, 15], the traditional models have difficulty adjusting for the presence of unrelated, endemic baseline of diarrhea occurrences [16, 17].

### Our contribution

In this paper we propose a method of modeling transmission pathways of diarrhea using symptoms occurrence data from individual households consisting of family members with different susceptibility (for instance, children and adults) in the environment subject to waterborne pathogen contamination and possibly also other risks effecting baseline incidence rates. Our approach is quite general and allows to adjust not only for these different causes of diarrhea, even with data of poor resolution, but also for a variety of confounders typically encountered in similar observational studies. The particularly relevant confounders in our setting are the cases of non-symptomatic infectives and uninfected symptomatics. An example of a dataset of interest, as obtained from a cross-sectional study of Cameroon households, is presented in Table 1. The dataset is especially interesting as it matches the results of household drinking water testing for pathogens with detailed household health survey and demographic data. For the type of observational data in Table 1 we propose to first fit the household-level occurrence model and then to apply parametric resampling technique akin to *the synthetic likelihood* (see, e.g., [18, 19]) to obtain approximate distribution of the mean occurrence across households. Due to some general approximation results for a wide class of counting processes (see [20] chapter 11) we may assume here that the mean of the diarrhea occurrence is well-approximated by the stationary state of a certain system of ordinary differential equations (ODEs) with additive normal noise. This main idea behind our proposed approach is summarized in Fig 1.

The count data *X*_{1}, … *X*_{M} represents household level diarrhea cases among adults and juveniles under and contaminated (*V* = 1) and clean (*V* = 0) environments and is used to fit the generative model (observed likelihood) based on (1). The generated pseudo-data are then used to fit the SID model based on (2) and (3).

Note that without the intermediate resampling step it is in general not possible to obtain the estimates of the transmission dynamics simply from cross-sectional occurrence. However, under the assumption of a constant risk (which is typically tacitly made in similar studies) we may consider the observed cases of diarrhea as a statistical sample from a stationary disease process. In this case, the ODEs parameters may be identified using the Bayesian inference techniques with the help of an MCMC algorithm (see, e.g., [21]). Using this approach we are able to obtain all relevant posterior estimates, including transmission rates and the expected count of latent infections (infection present but no diarrhea symptoms) as well as disease-unrelated occurrences (diarrhea symptoms present but no infection). Details of the analysis method are provided in the next section. To our knowledge, ours is the first application of the synthetic likelihood/resampling method to observational data on household diarrhea occurrence. We hope that similar approaches can be applied to larger datasets and consequently help improve current guidelines for treatment and intervention for diarrhea [2].

## Materials and methods

### Occurrence data and observed likelihood

To perform our analysis we use data from the observational study investigating the relationship between household drinking water quality and diarrhea occurrence in Maroua, Cameroon. The data was described in [22] and more recently in [12]. Briefly, the study examined the relation between the occurrences of diarrhea and the presence of gastrointestinal pathogens within home drinking water sources in four urban neighborhoods in Maroua, the regional capital of northern Cameroon. For the purpose of the study diarrhea was defined as three or more loose bowel movements (“selles molles” in French) per day.

Heads of household assented to participation in the study with the use of a verbal consent script. In addition, other members of the household present for the survey assented to the survey and water sampling. Assent was recorded through use of a verbal consent script by the technicians collecting samples and administering the survey. The protocol was approved by the Ohio State University Institutional Review Board/ Human Research Protection Program (Federal-wide Assurance #00006378 from the Office for Human Research Protections in the Department of Health and Human Services: protocol 2010B0004). Within this ethical review for the survey the protocol was approved for a waiver of signed consent forms due to the low literacy of the population and cultural inappropriateness of obtaining signatures to record consent.

Diarrhea occurrence data and water samples from home water storage containers were collected from *M* = 63 households. Pathogen contamination was assessed using qPCR method, targeting several potential zoonotic pathogens including *Campylobacter* spp., Shiga toxin producing *Escherichia* coli (*stx*1 and *stx*2), and *Salmonella* spp. Microbial source tracking (MST) targeted three different host-specific markers: HF183 (human), Rum2Bac (ruminant) and GFD (poultry) to identify fecal contamination sources. For the purpose of our analysis below the pathogen/MST levels in each household were encoded as binary outcomes (water contamination present/absent) and combined with collected demographic information on the number of household members, their age and the history of diarrhea symptoms within last 14 days. Two neighborhoods tested positive for most pathogens/MST while the others only tested positive for one or two. As *E.coli* was found in all water samples, it was excluded from our contamination criterion. Spatial variation of pathogens/MST existed between sources, storage containers, and neighborhoods but was not included in the set of covariates for current analysis due to small sample sizes of different spatial patterns. Differing population density and ethno-economic characteristics could potentially explain and correct for the variation but for the sake of simplicity we have not performed such analysis here. For illustration, several data points from the Cameroon dataset are listed in Table 1 where the diarrhea occurrences are recorded separately for adult and juvenile (under 15 years old) household members. The total number of adults and juveniles in the water contaminated (resp. uncontaminated) households was *N*_{J}(1) = 103 and *N*_{A}(1) = 111 (resp. *N*_{J}(0) = 99 and *N*_{A}(0) = 155).

Assuming that the data in Table 1 constitutes a sample from the cross-sections of a stationary distribution, each datapoint may be represented as a pair of occurrences of diarrhea (*D*_{J}, *D*_{A}) observed, respectively, in adult and juvenile compartments of random size (*N*_{J}, *N*_{A}). Because the mean and variance for the juvenile and adult compartments are approximately the same, the independent Poisson distributions are assumed for their respective sizes. Given the compartment sizes and the status of water contamination, the respective numbers of occurrences within compartments are assumed to follow binomial distributions with probabilities *p*_{J}(*V*) and *p*_{A}(*V*) where *V* ∈ {0, 1} denotes the presence or absence of the water contamination. Although we do not model it explicitly due to small sample sizes, we tacitly assume the functional relationship between *p*_{J}(*V*) and *p*_{A}(*V*). In summary, for the compartments of sizes *N*_{J}, *N*_{A}, and the number of symptomatic (diseased) individuals denoted by *D*_{J}, *D*_{A}, and the household contamination status *V*, we assume the following *generative model* for the data
(1)

Under the above model the likelihood-based inference may be now performed to estimate the compartment- and contamination-specific vector of parameters *η*_{V} = (*p*_{J}(*V*), *p*_{A}(*V*), *λ*_{J}, *λ*_{A}) for *V* ∈ {0, 1}. For ease of notation, in what follows we suppress the subscript *V* when describing the parameters. Further details are provided in S1 Appendix. The numerical values of the estimated parameters are given in the next section.

### SID model and synthetic likelihood

The data in Table 1 is cross-sectional and cannot be immediately used to analyze the within-household transmission pattern. Nevertheless, the generative representation via (1) allows for valid statistical inference indirectly, using the idea of synthetic likelihood akin to that proposed in [18]. Note that if we consider the sample from (1) as a set of independent realizations of some stationary counting process, then, by a version of the central limit theorem, we could expect its mean to approximately follow the normal distribution centered at a stationary solution of a certain ODE system (see [23] chapter 5). For the particular problem in hand, it is natural to take the ODE system to be one describing a compartmental SID (susceptible-infected-diseased) model defined below. Accordingly, the fitted generative model (1) may be used to generate *n* independent batches of *M* pseudo-data (denoted *X*_{obs}, see below) with corresponding *n* averages (denoted , see below) following a normal distribution with mean determined by the stationary SID system of ODEs.

In order to describe the SID model and introduce the required notation, denote the household-observed number of non-symptomatic and non-infected, adults (resp. juveniles) by *S*_{A} (resp. *S*_{J}), the non-symptomatic but already pathogen infected adults (resp. juveniles) by *I*_{A} (resp. *I*_{J}), and the symptomatic, or diseased, either infected or non-infected, adults (resp. juveniles) by *D*_{A} (resp. *D*_{J}). The complete data for a given household with environment *V* ∈ {0, 1} comprises the vector *X* = (*S*_{J}, *I*_{J}, *D*_{J}, *S*_{A}, *I*_{A}, *D*_{A}, *V*) although in practice (due to lack of symptoms among *I*’s) only the vector *X*_{obs} with the aggregated counts of the non-symptomatic and as well as *D*_{A}, *D*_{J} and *V* is observable. Under these assumptions, the Maroua data (cf. Table 1) may be considered as the set of *M* = 63 independent observations of the random vector *X*_{obs}. We denote the empirical mean of *X*_{obs} based on *M* observations by and assume that it follows the normal distribution with mean given by the stationary compartmental SID model, as summarized in Table 2 and Fig 2. Since in the actual dataset only a single vector is available, we generate additional means vectors from the *pseudo-data* using (1) as described above. As seen in Table 2, depending on the status of contamination (*V* ∈ {0, 1}), our SID model is parametrized by the vector *θ*_{V} of 12 (*V* = 0) or 14 (*V* = 1) parameters. As before, we suppress the subscript *V* in what follows and write
to denote the appropriate rates of transmission and recovery/infection between different model compartments and types.

Solid lines denote transitions within compartments. Dashed lines indicate transitions due to interactions (both within and across compartments) between susceptible (S) and infected (I) individuals.

The graphical representation of the network is provided in Fig 2 and the corresponding ODE system in (A.2) in S1 Appendix.

As summarized in Table 2, for *i*, *j* ∈ {*A*, *J*}, *β*_{ij} denotes the rate at which , through interaction with , converts into ; *Vϕ*_{i} denotes the rate at which infected environment (*V* = 1) converts into (*Vϕ*_{i} = 0 for *V* = 0); *α*_{i} denotes the rate at which converts into and *δ*_{i} is the rate of the reverse conversion. Finally, *ν*_{i} denotes the rate at which progresses to and *γ*_{i} − *ν*_{i} denotes the rate at which returns back to . The graphical diagram of all the transitions and interactions in Table 2 is presented in Fig 2.

The corresponding ODE system describing the SID dynamics is presented in (A.2) in S1 Appendix. Based on that system we may relate the generated pseudo-data to model parameters as follows. Consider the average number of household asymptomatic individuals in adult and juvenile groups and denote

Solving the SID model ODE for its steady state we obtain, on one hand,
(2)
and, on the other,
(3)
where the *f*’s are defined by their left-hand sides and we denote *θ*_{1} = (*β*_{JJ}, *β*_{JA}, *Vϕ*_{J}, *γ*_{J}), *θ*_{2} = (*β*_{AA}, *β*_{AJ}, *Vϕ*_{A}, *γ*_{A}), *θ*_{3} = (*α*_{J}, *ν*_{J}, *δ*_{J}), and *θ*_{4} = (*α*_{A}, *ν*_{A}, *δ*_{A}), so that ** θ** = (

*θ*_{1},

*θ*_{2},

*θ*_{3},

*θ*_{4}).

Note that the quantities and are derived from the pseudo-data obtained by sampling from the fitted model (1).

### Parameter estimation

Due to a relatively small size *M* of the dataset, we do not attempt to evaluate the variable *V* dynamically but instead consider two separate SID models for contaminated and uncontaminated environments (*V* = 1 and *V* = 0). In each case, in order to estimate the vector of parameters ** θ** as well as two hidden states (

*I*

_{A},

*I*

_{J}) based on the generated sample of

*n*pseudo-averages , we employ an MCMC procedure. Its advantage is in being able to handle the latent (unobservable) variables and in providing a simple and intuitive way of validating the final model against observations in Table 1. The disadvantage is in a relatively high computational overhead due to a somewhat complicated Metropolis-within-Gibbs algorithm [24] described in Algorithm 1 below. Details on the forms of the conditional distributions are provided in S1 Appendix. To ease notation, let

*θ*_{−k}denote the vector

**with its**

*θ**k*-th component removed (

*k*= 1, …, 4). Recall that when

*V*= 0 the

*ϕ*parameter is 0 and hence is excluded from from

*θ*_{1}and

*θ*_{2}. We estimate parameters

**separately for**

*θ**V*= 0, 1 via the following iterative procedure.

#### MCMC algorithm for SID model fitting.

- Given the state of environment
*V*∈ {0, 1} generate a collection of*n*pseudo-data points , each of them being an average of*M*independent draws of the pair from (1) under fitted parameters . - Initiate values of the rate vector
as well as*θ**I*_{A}(*V*), and*I*_{J}(*V*), according to their prior distributions (see S1 Appendix). - Using the Metropolis-Hastings (MH) step, conditionally on (
*I*_{A},*I*_{J})(*V*) and , draw sequentially samples from the conditional distributions of*θ*_{k}|*θ*_{−k},*k*= 1, …, 4. The form of the proposal in MH step as well as the forms of the conditionals are given in (A.4)–(A.7) in S1 Appendix - Using the MH step, conditionally on
and , draw independently from*θ**I*_{A}(*V*) and*I*_{J}(*V*) using their conditionals as given in (A.8) and (A.9) in S1 Appendix. - Repeat step 3 and 4 until convergence.

In our analysis, we iterated the above MCMC procedure 40,000 times retaining every 10-th iteration for *V* = 0 and 20-th iteration for *V* = 1, in order to ensure good chain mixing. We also removed the first 20,000 iterations as a burn-in set and summarized the posterior statistics based on the remaining iterations. To check for the robustness of our analysis with respect to the amount *n* of the generated pseudo-data, we applied the MCMC algorithm above with *n* = 50 and *n* = 100, however, since the results were virtually identical, we only report below on the case *n* = 100. Although larger values of *n* could be also considered, this particular value seems to strike a good balance between required MCMC precision and computational overhead.

#### Model validation.

The final step in our model estimation procedure was validation against the observed data. This was done by comparing the posterior distributions of the model generated data samples using estimated parameters with the actually observed values from *X*_{obs} and looking for large departures from the posterior mode.

#### Software.

The R-scripts for carrying out tour computational analysis described above along with the Maroua dataset adapted from [22] are available at https://github.com/cbskust/SID.

## Results

The initial set of fitted parameters obtained for the generative model (1) based on the *M* = 63 Maroua households dataset is provided in Table 3. As can be seen from the entries in the table, an interesting feature of this dataset appears to be that the probability of diarrhea in the juvenile compartment is *decreased* in the households with contaminated water environment (*V* = 1). There may be several reasons for this finding which appears inconsistent with other reported observational studies [25]. First, the survey data for juveniles may be less reliable than for adults, particularly in young children who under our definition are also a part of the juvenile compartment. Second, it is known [26] that a substantial number of juvenile diarrhea cases is, in fact, unrelated to the waterborne infections and the collected data may be simply confounded with this unrelated process. Finally, it is also possible that the contaminated environment offers some measure of immunity from diarrhea, perhaps due to non-specific activation of the immune system [26].

Estimates of *λ* are pooled across *V* values.

The numerical results of the MCMC-based fitting of ** θ** for SID model under both

*V*= 0 and

*V*= 1 are summarized in Table 4 where we list the posterior means, posterior standard deviations, and 95% credible intervals (CIs) based on the generated

*n*= 100 pseudo-data points and 2000 thinned posterior samples. Complementing the table entries, the full sets of marginal densities and trace plots for the posterior distributions are provided, in S1–S4 Figs of the Supporting Information.

Although we opted not to conduct the direct comparison of the parameter values in ** θ** between

*V*= 0 and

*V*= 1, one may somewhat informally perform such a comparison based on the CI entries in Table 4. In general, if for a particular parameter in

**its CI bounds under**

*θ**V*= 0 are contained within the CI bounds under

*V*= 1, or vice-versa, one would consider the corresponding posterior distributions as statistically (i.e., for given data) equal. To facilitate such analysis in Table 4 the parameters with statistical distinct posterior distributions are entered in bold. From the entries in Table 4 it therefore follows that although the posterior distributions of the transmission rates

*β*

_{JJ}and

*β*

_{AJ}are statistically different between

*V*= 0 and

*V*= 1, it is not so for the remaining rates

*β*

_{AA}and

*β*

_{JA}. Similarly, we find that although the average number of silent infections among juveniles under

*V*= 0 and

*V*= 1 (mean vs mean ) is not statistically different, this is not the case for the average number of silent infections among adults, despite the smaller absolute difference of their means. (This particular finding appears to be due to the relatively large value of the posterior standard deviation of

*I*

_{J}(1).) Similar comparisons may be also performed between the recovery rates. Indeed, we find that while the recovery rate in the adult compartment is significantly slowed down in the contaminated environment (mean

*δ*

_{A}(1) = 0.7880 vs mean

*δ*

_{A}(0) = 0.6314), the rate in the juvenile compartment is not significantly changed.

From the view point of waterborne disease, the most interesting are perhaps the estimates of the water contamination effects on the households diarrhea persistence in different compartments. In Table 4 our SID model quantifies the effect of water contamination (*V* = 1) in the households on average as *ϕ*_{A} = 0.5318 and *ϕ*_{J} = 0.5159. This indicates that despite the differences in the diarrhea prevalence patterns among juveniles and adults (see Table 3), the overall effect of waterborne pathogens is quantitatively similar. Note that the simple estimates in Table 3 which are based on the survey data (Table 1) and ignore the SID dynamics and asymptomatic infections suggest otherwise (cf. also [12]). Note also that according to the SID model the average number of infectious individuals (both pre-symptomatic and never-symptomatic) is larger in the contaminated environment, with the observed difference being significant in the adult compartment. Moreover, Eqs (2) and (3) for the specific values in Table 4 indicate that remediating contaminated water environment in the household (moving from *V* = 1 to *V* = 0) is likely to remove the symptomatic cases in the average adult compartment but not so in the juvenile one. Finally, let us also note that the observed higher prevalence of diarrhea among juveniles Table 3 in the clean environment may be explained on the basis of our SID model by the higher transmission in the juvenile compartment (*β*_{JJ}(0) is significantly greater then *β*_{JJ}(1)) and an increase in non-pathogen/ non-household diarrhea (increased *α*_{J}).

The results of model validation are shown separately for the *A* and *J* compartments in Fig 3 where the numerical values of the means of *X*_{obs} (vertical lines) are plotted along with the corresponding histograms of their posterior distributions obtained from the model with estimated parameters. As seen from the plots, the observed values are within the reasonable range of the posterior mode and hence may be considered in agreement with the fitted model. This also implies that the CI bounds in Table 4 may be interpreted as plausible ranges of respective parameter values consistent with the observed data. These ranges are quite wide indicating somewhat large uncertainty, likely due to moderate sample size (*M* = 63).

## Summary and discussion

In many observational disease studies we lack the ability to collect repeated measurements over time, either due to cost or practicality considerations. Consequently, disease transmission studies often have to rely on cross-sectional data containing latent variables and multiple confounders (e.g. latent infections or different disease susceptibility across population). For such data we have proposed here a statistical method for direct analysis of the transmission rates across different population compartments and different environmental risk factors. The ideas for statistical analysis came from the consideration of stationary SID model based on differential equations and synthetic likelihood with MCMC algorithm for estimating parameters. The proposed estimation method appears to be quite stable and capable of converging in a relatively large parameter space (in our example we had up to 16 parameters) even when supplied with only slightly informative prior distributions for moderate sample size.

Applying modern Bayesian approach to fit SID transmission model allowed us to better account for the uncertainty of various model components (i.e., bias or lack of accuracy) as well as the uncertainty of outcomes predictions (i.e., variance or lack of precision). It also allowed us to naturally incorporate any additional information about the model parameters. For instance, should some of the estimated compartmental diarrhea probabilities be fixed at specific values (say, based on prior studies) the fitting algorithm could easily incorporate this additional information. In such case one would expect to see both model’s precision and accuracy to increase. We also note that in our example dataset the posterior marginal distributions of the parameters were all unimodal, indicating that the model parameters were identifiable, that is, their joint posterior distribution had a unique mode contained in the range of plausible parameter values given the observed data. In general, our proposed statistical approach may be viewed as an alternative to a more traditional epidemiological disease risk analysis based on the odds ratios, where the Cochran-Mantel-Haenszel (CMH) stratification method is typically used to adjust for confounders.

The example dataset we have chosen was part or a larger study investigating possible links between drinking water contamination and diarrheal diseases in urban environment of Central/Sub-Saharan Africa [12, 22]. Although this particular study did not specifically examine other factors associated with gastrointestinal infections (socioeconomic status, overall sanitation, household education, storage, etc), they likely did contribute to the observed baseline (not water-related) occurrence. However, our statistical analysis indicated that in our dataset they constituted only a small minority of the observed symptomatic cases.

In order to better appreciate the possible implications of SID-type analysis for public health policies and interventions, it is helpful to compare its results (Table 4) with the results from initial, purely descriptive analysis of the Maroua dataset (Table 3) akin to that conducted previously in [22]. We note that since descriptive analysis in Table 3 is based on risks comparison (i.e., the binomial probabilities) it provides only an aggregated measure of the water contamination effect on the prevalence of diarrhea. It is not clear in particular what specific transmission pathways should be targeted for intervention in order to minimize the observed occurrence (note that the juvenile risk is actually smaller in contaminated households). In contrast, the SID analysis in Table 4 provides (via Eqs (2) and (3)) an explicit numerical relations between transition rates and occurrence, and therefore a comprehensive picture of competing household transmission risks. Consequently, the SID analysis allows for a more detailed examination of how household occurrence risk is associated with the water environment and how it is transferred across age compartments. Such information appears essential for developing more targeted water intervention strategies beyond those that are currently recommended by WHO (see, [2] Section 11.3) for reducing diarrhea risk.

## Supporting information

### S1 Fig. Marginal plots for the posterior parameters of the SID model under *V* = 1 and *n* = 100 with 2,000 iterations.

https://doi.org/10.1371/journal.pone.0206418.s001

(TIF)

### S2 Fig. Marginal plots for the posterior parameters of the SID model under *V* = 0 and *n* = 100 with 2,000 iterations.

https://doi.org/10.1371/journal.pone.0206418.s002

(TIF)

### S3 Fig. Diagnostic trace plots for the posterior parameters of the SID model under *V* = 1 and *n* = 100 with 2,000 iterations.

https://doi.org/10.1371/journal.pone.0206418.s003

(TIF)

### S4 Fig. Diagnostic trace plots for the posterior parameters of the SID model under *V* = 0 and *n* = 100 with 2,000 iterations.

https://doi.org/10.1371/journal.pone.0206418.s004

(TIF)

### S1 Appendix. Appendix on statistical analysis.

Contains additional formulas and derivations related to the statistical analysis.

https://doi.org/10.1371/journal.pone.0206418.s005

(PDF)

### S1 Data. Maroua household data.

Diarrhea symptoms data from 63 households from Maroua, Cameroon. The household ID, number of juveniles (J) and adults (A) as well as the number of respective symptomatics (DJ and DA).

https://doi.org/10.1371/journal.pone.0206418.s006

(CSV)

## Acknowledgments

The authors thank Seungjun Lee for helping them organize and interpret the data and Will Gehring for helping with some manuscript figures. They are also indebted to the reviewers for their helpful comments on the earlier draft of the paper.

## References

- 1.
World Health Organization Diarrhoeal Disease Fact Sheet; May 2017 http://www.who.int/mediacentre/factsheets/fs330/en/
- 2.
World Health Organization The Treatment of Diarrhoea. A manual for physicians and other senior health workers. 4th edition, 2005 http://www.who.int/mediacentre/factsheets/fs330/en/
- 3. Julian TR. Environmental transmission of diarrheal pathogens in low and middle income countries. Environ Sci Process Impacts. 2016 Aug 10; 18(8):944–55. pmid:27384220
- 4. Headey D, Hirovnen K. Is exposure to poultry harmful to children nutrition? An observational analysis for rural Ethiopia. PLoS One. 2016 Aug 16; 11(8):e0160590. pmid:27529178
- 5. Sudfeld CR, McCoy DC, Danaei G, Fink G, Ezzati M, Andrews KG, Fawzi WW. Linear growth and child development in low and middle-income countries: a meta-analysis. Pediatrics. 2015 May; 135(5):e1266–75. pmid:25847806
- 6. Zambrano LD, Levy K, Menezes NP, Freeman MC. Human diarrhea infections associated with domestic animal husbandry: a systematic review and meta-analysis. Trans R Soc Trop Med Hyg. 2014 Jun; 108(6):313–25. pmid:24812065
- 7. Ownby DR, Johnson CC, Peterson EL. Exposure to dogs and cats in the first year of life and risk of allergic sensitization at 6 to 7 years of age. JAMA. 2002; 288(8):963–972. pmid:12190366
- 8. Braun-Fahrländer C, Gassner M, Grize L, Neu U, Sennhauser FH, Varonier HS, Vuille JC, Wathrich B. Prevalence of hay fever and allergic sensitization in farmer’s children and their peers living in the same rural community. Clin & Experimental Allergy. 1999; 29(1):28–34.
- 9. Waser M, Von Mutius E, Riedler J, Nowak D, Maisch S, Carr D, Eder W, Tebow G, Schierl R, Schreuer M, Braun-Fahrländer C,. Exposure to pets, and the association with hay fever, asthma, and atopic sensitization in rural children. Allergy. 2005; 60(2):1398–9995.
- 10. Simpson A, Custovic A. Pets and the development of allergic sensitization. Current Allergy & Asthma Reports. 2005; 5(3):212–220.
- 11. Schmidt WP, Arnold BF, Boisson S, Genser B, Luby SP, Barreto ML, Clasen T, Cairncross S. Epidemiological methods in diarrhoeal studies—an update. Int J Epidemiol. 2001 Dec; 40(6):1678–92.
- 12. Healy-Profitos J, Lee S, Mouhaman A, Garabed R, Moritz M, Piperata B, Lee J Neighborhood diversity of potentially pathogenic bacteria in drinking water from the city of Maroua, Cameroon. J Water Health. 2016; 14(3):559–570. pmid:27280618
- 13. Medina DC, Findley SE, Guindo B, Doumbia S. Forecasting non-stationary diarrhea, acute respiratory infection, and malaria time-series in Niono, Mali. PLoS One. 2007; 2(11):e1191.
- 14. Watson CH, Edmunds WJ. A review of typhoid fever transmission dynamic models and economic evaluations of vaccination. Vaccine. 2015; 33:C42–C54. pmid:25921288
- 15. Merler S, Ajelli M, Fumanelli L, Gomes MFC, Piontti AP, Rossi L, Chao DL, Longini IM, Halloran ME, Vespignani A. Spatiotemporal spread of the 2014 outbreak of Ebola virus disease in Liberia and the effectiveness of non-pharmaceutical interventions: a computational modelling analysis. The Lancet Infectious Disease. 2015; 15(2):204–211.
- 16. Kotloff KL, Blackwelder WC, Nasrin D, Nataro JP, Farag TH, van Eijk A, Adegbola RA, Alonso PL, Breiman RF, Faruque ASG. The Global Enteric Multicenter Study (GEMS) of diarrheal disease in infants and young children in developing countries: epidemiologic and clinical methods of the case/control study. Clinical Infectious Diseases. 2012; 55:S232–S245. pmid:23169936
- 17. Saidi SM, Lijima Y, Sang WK, Mwangudza AK, Oundo JO, Taga K, Aihara M, Nagayama K, Yamamoto H, Waiyaki PG. Epidemiological study on infectious diarrheal diseases in children in a coastal rural area of Kenya. Microbiology and Immunology. 1997; 41(10):773–778. pmid:9403500
- 18. Wood SN. Statistical inference for noisy nonlinear ecological dynamic systems. Nat Letters. 2010 Aug; 466:1102–07.
- 19. Hartig F, Calabrese JM, Reineking B, Wiegand T, Huth A. Statistical inference for stochastic simulation models and application. Ecology Letters. 2011; 14:816–27. pmid:21679289
- 20.
Kurtz TG, Ethier S. Markov Processes: Characterization and Convergence. Wiley. 2005.
- 21.
Brooks S, Gelman A, Jones GL, Meng X. Handbook of Markov Chain Monte Carlo. Chapman & Hall. 2011.
- 22. Healy-Profitos J, Mouhaman A, Lee S, Mouhaman A, Garabed R, Moritz M, Piperata B, Tien J, Bisesi M, Lee J. Muddying the Waters: A New Area of Concern for Drinking Water Contamination in Cameroon. Int. J. Environ. Res. Public Health. 2014; 11, 12454–12472.
- 23.
Andersson, H, Britton T. Stochastic Epidemic Models and their Statistical Analysis Springer. 2000.
- 24. Gilks WR, Best NG, Tan KKC.Adaptive Rejection Metropolis Sampling within Gibbs Sampling J R Stat Soc Series C (Appl. Stat.) 1995; 44(4): 455–472
- 25. Garrett V, Ogutu P, Mabonga P, Ombeki S, Mwaki A, Aluoch G, Quick RE. Diarrhoea prevention in a high-risk rural Kenyan population through point-of-use chlorination, safe water storage, sanitation, and rainwater harvesting. Epidemiol Infect. 2008; 136(11): 1463–1471 pmid:18205977
- 26. Brown J, Cairncross S, and JEnsink JH. Water, sanitation, hygiene and enteric infections in children. Arch Dis Child. 2013 Aug; 98(8): 629–634. pmid:23761692