Figures
Abstract
Background
Schistosomiasis is a neglected disease prevalent in tropical and sub-tropical areas of the world, especially in Africa. Detecting the presence of the disease is based on the detection of the parasites in the stool or urine of children and adults. In such studies, typically, data collected on schistosomiasis infection includes information on many negative individuals leading to a high zero inflation. Thus, in practice, counts data with excessive zeros are common. However, the purpose of this analysis is to apply statistical models to the count data and evaluate their performance and results.
Methods
This is a secondary analysis of previously collected data. As part of a modelling process, a comparison of the Poisson regression, negative binomial regression and their associated zero inflated and hurdle models were used to determine which offered the best fit to the count data.
Results
Overall, 94.1% of the study participants did not have any schistosomiasis eggs out of 1345 people tested, resulting in a high zero inflation. The performance of the negative binomial regression models (hurdle negative binomial (HNB), zero inflated negative binomial (ZINB) and the standard negative binomial) were better than the Poisson-based regression models (Poisson, zero inflated Poisson, hurdle Poisson). The best models were the ZINB and HNB and their performances were indistinguishable according to information-based criteria test values.
Citation: Nketia K, de Souza DK (2024) Using zero-inflated and hurdle regression models to analyze schistosomiasis data of school children in the southern areas of Ghana. PLoS ONE 19(7): e0304681. https://doi.org/10.1371/journal.pone.0304681
Editor: Jean Coulibaly, Universite Felix Houphouet-Boigny, SWITZERLAND
Received: November 17, 2023; Accepted: May 16, 2024; Published: July 12, 2024
Copyright: © 2024 Nketia, de Souza. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting information files. The python, R codes and data for model comparison and analysis can be found at https://github.com/Kojo-Nketia/Using-hurdle-models-to-analyze-schistosomiasis-count-data.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Schistosomiasis is a parasitic disease caused by trematode parasites of the genus Schistosoma [1]. It affects approximately 251.4 million people globally across 78 countries, with an estimated cases of 90% in Africa [2, 3]. The main causes of schistosomiasis are Schistosoma haematobium and S. mansoni [4]. The presence of S. haematobium results in genitourinary disease, while S. mansoni causes hepatobiliary schistosomiasis. Complications of schistosomiasis may include liver fibrosis, varices, and bladder carcinoma. Additionally, the infection is linked to anemia, deficiencies in nutrition, and stunted growth. It also has negative impacts on cognitive development, reducing physical activity, school performance, work capacity, and productivity [5]. Schistosomiasis results in high morbidity and socio-economic burden to affected individuals and communities, with risk factors including poor sanitation, prolonged exposure to infested freshwater bodies, and occupational hazards due to farming and fishing [6, 7].
In order to show the full burden of infection due to schistosomiasis, it is crucial to understand the epidemiology of the disease. The number of parasitic worms present in a group of people determines the transmission intensity of schistosomiasis, which is indirectly measured by the quantity of discharged eggs in stool or urine [8]. Appropriate statistical models can be used to study the epidemiology of infection. However, while using statistical techniques to analyze count data, methodological issues and the nature of the data must be taken into consideration [9].
Count data are common in health science research, especially when researchers investigate topics such as parasitic infections, COVID 19 among students, and the frequency of serious incidents in medical laboratories [10]. Count data can assume a probability distribution such as a Poisson. Count data, however, can also display more variability known as overdispersion (i.e., variance >mean) [11]. Often, a Poisson or negative binomial (NB) model cannot adequately account for the sample’s number of zeros and may result in distorted test statistics, goodness-of-fit and estimated standard errors [12]. Zero-inflated (ZI) and hurdle models are two models that are often used [9], to account for zero-inflated datasets. Further, hurdle models are an extension of the zero inflated (ZI) models. Hurdle models are also known as two part models [13], where the first part is a Bernoulli (for zero and non-zero counts) probability and the second part, which handles positive counts, is the zero-truncated Poisson (ZTP) or zero-truncated negative binomial (ZTNB) distribution [10]. Previous studies have demonstrated that if excessive zeros are not taken into consideration, both the zeros and nonzero counts would fit poorly [14].
The approach used to model zero counts is the primary difference between ZI and hurdle models. Zero observations in a ZI model [15] might come from either “sampling zeros” that are part of the underlying sampling distribution (Poisson, or negative binomial) and “structural zeros” that cannot score anything other than zero [16]. That is to say, the total number of zeros are divided into excess zeros and zeros generated by the distribution. Whereas the ZI models assume that both sources of zeros are involved, hurdle models assume only structural sources [10]. The typical Poisson or negative binomial distribution is used for the sampling of zeros, and it is presumed that they occur by chance [17]. This is the nature of the schistosomiasis count data, especially in low prevalence settings.
The purpose of this study is to assess the performance of the statistical models Poisson, NB and their related ZI and hurdle models in relation to different set of predictors and infection intensity for schistosomiasis to inform the effective analysis of data.
Materials and methods
This is a secondary analysis of de-identified data collected within a larger epidemiological study on allergy and parasitic infections (“GLOFAL GHANA”) [18, 19]. The data constitute 1345 participants (635 male, 710 female)—within 12 communities in southern Ghana. These communities were grouped into two areas; rural and urban. The age range of participants was 4–21 years, distributed from nursery & kindergarten (considered as pre-school), primary and junior high school. The ages were grouped into three age groups with group one consisting of children aged 4–9, group two consisting of children aged 10–15 and group three consisting of children aged 16–21. Parents’ occupations were also grouped into the main occupational activities of farming and fishing, and others (including teaching, driving, masonry, trading, etc). Water sources used by communities are pipe borne water, tanker (treated), tanker (untreated), river/stream, well/borehole. The count data is available in S1 Appendix in the supporting information section below.
Demographic covariates
The covariates considered for this analysis are the age group, sex, educational level, area (i.e. rural or urban), the occupation of parents, and sources of water usage. These covariates have been identified as predictors for schistosomiasis infection informed by literature [20–22]. These covariates, also taken as predictors, formed the regression matrix for the models used. The information about these covariates have been listed in Table 1.
Statistical methods
In a generalized linear model (GLM), the response variable is related to a linear combination of the regression variables through a non-linear link function [23],
(1)
Poisson regression is a type of GLM used to model discrete count data whereby the link function is exponential and E(y|x) ≔ λ represents the expected number of counts given X and is expressed as,
(2)
Inverting this the problem reduces to standard linear regression against the logarithm of the mean [24],
(3)
where b0, b1, …, bn are the intercepts and slopes we fit to minimize the residual of the errors. The probability of observing the count yi is Poisson distributed as per the following probability mass function:
(4)
For more complex models, where the Poisson assumption does not hold, there may be more parameters such as overdispersion parameter α, or zero-truncation probability π, each of which can be fitted with a different link function and in a similar way [17].
For a given mean, the negative binomial distribution has a higher variance than the Poisson distribution [25]. Its probability mass function is expressed as
(5)
The mean and variance of the negative binomial distribution are E(y) = λ, and Var(y) = λ(1 + αλ), where α is the dispersion parameter and if α = 0, negative binomial approaches the Poisson [24].
Zero-inflated models introduce extra probability mass to account for the excess zeros in the outcome. This results in a two-state mixture distribution with a PMF defined by the ZIP model;
(6)
In a case where overdispersion and excess zeros occur in data, recommending the search for an alternate model to the ZIP model. The zero inflated negative binomial (ZINB) is given by the equation,
(7)
where yi is the observed counts and π is the probability of non-infection. Eq (7) models two processes, where the first process generates the excess zeros, with the probability π, and the second one uses the negative binomial to generate counts (including zeros).
In contrast, the hurdle models also have two components which are the zero count and the positive counts for both the Poisson and negative binomial model [9]. The first component handles the zero count and the other part deals with the positive counts which is referred to as the zero truncated Poisson or NB [17]. In Eqs (8) and (9) below, (1 − π) is the likelihood of overcoming the “hurdle” and generating a non-zero count [26]. The hurdle Poisson model is expressed as
(8)
For the negative binomial it would look as follows:
(9)
where yi is the observed counts taking values y = 0, 1, 2, ‥ and π is the probability of non-infection. Also, the second equation in the piecewise function of Eqs (8) and (9) are the zero-truncated Poisson and zero-truncated negative binomial distribution respectively.
The parameters of each model under different socio-demographic predictors were estimated by using maximum likelihood estimation (MLE) to simulate the model. The binary component, which determines if an observed count is zero (negative infection) or non-zero (positive infection), of the HNB model uses a logit binary link [10],
(10)
and the schistosomiasis positive counts as a dependent variable for our proposed regression model is
(11)
where X is the regression variable matrix, γ0, γ, β0 and β are the vectors of parameters.
Models comparison
Several tests have been used for comparing models. An example is the Vuoung test [27] for non-nested models, such as ZINB and HNB, in several applications. Alternatives include the Akaike, Bayesian information-theoretic criteria and an approach that embeds the alternative models in an artificial compound model [26, 28]. We used the Akaike and Bayesian information criteria to understand the model performances against each other and discriminate against other tests. The Akaike Information Criteria (AIC) is likely to perform well when heterogeneity is small. However, when heterogeneity is large, which may result in overfitting, the Bayesian information criterion (BIC) will often perform better because of the stronger penalty provided [29]. By adding a penalty term for the number of parameters in the model, the BIC solves this issue. The AIC and BIC were used to test the goodness-of-fit of the models [30]. The mathematical formulation of the AIC/BIC with their basic properties is presented in Jiawei’s paper [31]. Better model fit is indicated by a lower AIC or BIC value indicating the data is more likely to have derived from the distribution in question. Both tests are based on the MLE method. The formula for the AIC is given as follows:
where
represents the log-likelihood of the data under the model, and k is the number of model parameters that is the number of variables and the intercept in the model [32]. The BIC is formally defined as
where
is the maximized value of the likelihood function, n is the sample size and k is the number of parameters estimated by the model.
Models compared are Poisson, NB, ZIP, ZINB, HP and HNB and if the discrepancy in the AIC or BIC values between competing models is less than 2.5, they cannot be distinguished, whereas a difference exceeding 10 (>10) indicates substantial evidence in favour of the model with the lowest criterion [9]. However, if the difference between two models lies in the range 2.5–6 then the preferred model is that with the smallest value if the sample size n >256. In the same light, if the difference between the models is in the range 6–9 then the preferred the model is the one with the lowest value if also n >64 [30].
The statsmodels, pandas and matplotlib libraries in python version 3.11.8 and political science computational laboratory (PSCL) package in the statistical software R version 4.3.1 were used for model fitting [33] and plotting figures.
Measures of association
Modelling was done using the most significant variables from the cross-tabulation with the chi-square test. However, the output results of each model include estimated p-values representing the association of a predictor, in relation with other predictors, to the dependent variable (schistosomiasis counts). So, we tabulated infection status paired with each of the chosen predictor individually, aiming to investigate association separately with the infection status, having a binary outcome (positive or negative).
Results
Descriptive statistics
A large percentage (94.1%) of the count data were zeros (non-infected individuals), with the remainder being positive counts of infected individuals. The median age of participants was 11 years. The mean of schistosomiasis egg count, excluding the zeros, was 33.4 egg per gram, with a standard deviation of 61.8 and an over-dispersion parameter α = 2.1. The distribution of the count data is shown in Fig 1A.
A: Histogram showing the distribution of egg counts for schistosomiasis; with sample size 1345. B: Histogram showing the distribution of egg counts for schistosomiasis in log axis. The divided regions; light (1–99 epg), moderate (100–400 epg) and high (>400 epg), equate the infection intensity count categories.
Grouping by infection intensity
Light, moderate, and high infection intensity categories are defined as counts between 1–99, 100–399, ≥400 epg respectively [34]. From the histogram of these counts given in Fig 1B, although the distribution is positively skewed there are a few gaps between counts after the 100 mark on the horizontal axis (or light intensity range). Due to this we took the low intensity count of the data with zeros (n = 1336, mean = 0.8 epg, variance = 33.3) and the “all intensity” count data (i.e. all sample) to evaluate the model performances. Upon excluding the moderate (n = 9, mean = 183.6 epg, variance = 5275) with no value in the high intensity range, the mean and variance of the low intensity sample became smaller than that of the actual mean (2 epg, including the zeros) and variance (286.7) of the count data but still resulted in over-dispersion (variance >mean) with the overdispersion parameter α = 74.0. We assessed how the model performed by neglecting moderate and high intensity values (>100 epg) with the same predictors and made a comparison between both levels of intensity. The results of this assessment, presented in Table 2, show a significant difference in both information criteria values between the two levels of intensity. An approximated difference, between using low intensity and all intensity values, of 200 was observed, against all levels of intensity, implying a better performance by the low intensity values compared to using all values.
From Table 2, it is observed that the difference between the AIC and BIC values of the ZINB and HNB model for using five predictors scenario is ≤3. Hence their performance is indistinguishable (AIC = 1206, BIC = 1305 for ZINB; AIC = 1203, BIC = 1302 for HNB). Also, a difference ≤3 between ZINB and HNB was observed using the ten predictors (all intensity) and the implication of this result is that, the ZINB model did not outperformed the HNB model under this scenario (AIC = 1209, BIC = 1360 for ZINB; AIC = 1207, BIC = 1358 for HNB). Overall, the NB and its associated models (NB, ZINB, HP) performed better than the Poisson-based models (Poisson, ZIP, HP) as shown in Fig 2 where a greater difference between the NB-based model’s AIC values and the Poisson-based models.
Comparison of the AIC values between Poisson and NB-based regression models (the standard, ZI and hurdle). This graph displays the AIC values for each model under different scenarios; A. shows all intensity (i.e. all sample (n = 1345)) simulated with 10 predictors. B. presents all sample size simulated with 5 predictors. C shows the low intensity samples (n = 1336) simulated with 10 predictors. D shows low intensity samples simulated with 5 predictors. For each sub-figure, the AIC value for the negative binomial-based models is much greater than its corresponding Poisson-based model.
Using different set of predictors
Different sets of variables were used to determine the performance of the model. The first five predictor variables were age groups, sex, educational level, area and parent’s occupation. The next ten predictor variables used were the first five in addition to their sources of water namely pipe-borne water, treated water from water tanker, untreated water from water tankers, river/stream and well/borehole. Table 2 shows their respective performance on the model. As a result, the AIC and BIC values for using five predictors are greater than that of the ten predictors for each model and scenario except for the HNB model results. For the HNB test values, using all intensity, the difference in the five predictor’s AIC value (1203) and ten predictors AIC value (1207) is 4. Hence, the best fit is in favour of both using both predictors. Similarly, the difference in their BIC values is 56, favouring the use of the 10 predictors (which has the lowest BIC value). However, the p-value of the likelihood ratio test between using the five and ten predictor for the HNB is not <0.05, concluding that compared to the 5 predictors, the 10 predictors did not give a significant fit improvement.
Interpreting model fitting
To evaluate the fit of regression models, individuals regularly examine the deviations of observations from predicted values using many approaches. One way to achieve this is when the observed frequencies are plotted against or overlaid on the predicted/expected values. Rootograms, as proposed by Kleiber and Zeileis [35] is a new way to assess count models, such that it presents the square roots of the predicted values as a continuous curve overlaid on that of the bars of observed counts. Fig 3 simply shows the rootogram for the first 50 number of eggs’ frequencies. Further, Fig 4 shows the differences between observed and expected counts, with bars hanging above and below from the zero line to highlight overestimated and underestimated values respectively, providing information about the model residuals rather than the fitted values shown in Fig 3.
The hurdle Poisson and hurdle negative binomial both captured 1265 zeros, which is equal to the observed zero count. Additionally, zero captured for zero inflated Poisson (ZIP) is 1265, zero inflated negative binomial (ZINB) is 1264 but the Poisson and NB model underestimated the zeros (513 and 412 zeros respectively) and overestimated the positive counts. The ZIP and HP expected values were lower than the observed for the first 10 counts whilst ZINB and HNB expected values were almost equal to the observed with a difference by a small margin.
Here, the but not
. The standard Poisson and negative regression models did not account for all the zeros observed; with zeros difference of 513 and 412 for Poisson and negative binomial from the number of observed zeros (low prediction). Moreover, the zero-inflated and hurdle models accounted for all zeros. However, we observe an overestimation for the positive counts for the standard Poisson and both underestimation and overestimation for the other models.
Figs 3 and 4 highlight that the Poisson model underestimated the zeros (752 zeros captured) and overestimated the positive counts, with a difference of 211 for positive count 1, 90 for count 2, 54 for count 3 and so on; whilst the negative binomial model predicted less zeros (853 captured; overestimating) and positive counts which were overestimated (positive count 1: overestimated by 183; 2 by 72; 3 by 43, …). Due to overdispersion and zero-inflation, the standard Poisson and negative binomial model were too high for low values. The zero-inflated and hurdle model’s zero prediction was equal to the number of zeros observed (1265). From Fig 3, without the ZINB and HNB models, the ZIP and HP was a bit high for low values as the counts moves away from zero but gave a better approximation to the observed frequencies, as it gets larger. S1 Fig presents the rectangles/bars of the observed frequencies plotted against the predicted ones for first 15 egg counts and in log axis.
ZINB and HNB regression model
The regression output results for the zero-inflated negative binomial and hurdle model parameter estimates where all sample size with ten predictors was used to simulate the models are shown in Table 3. From the results shown in Table 3, it can be seen that two variables (area, pipe borne, water by tankers (untreated) and well/borehole) of the negave variables (age groups <10 and 10–15, area, pre-school (educational level) and untreated water from tankers) have a significant effect as their p-values (Pr(>|Z |)) <0.05 for the ZINB and HNB model respectively. Also, three variables—area, pipe-borne water and water from well/borehole were significant with the positive counts with p-values, 0.017, 0.015 and 0.009 respectively for the ZINB model. Four variables—area, pipe-borne water, water from tankers (untreated) and well/borehole were significant with the positive counts with the p-values 0.009, 0.021, 0.005 and 0.002 respectively. Their respective standard errors too are attached.
Modelling and interpreting main effects
Table 4 shows the odds ratios (ORs) obtained from the logistic portion of the HNB model. Some confidence intervals for the ORs were observed to be wide. From Table 4, it is noted that people who live in urban areas have a lower risk of getting schistosomiasis infection as compared to people who live in rural areas. We can assess that children within the ages 10–15 (OR = 1.88, 95% CI: (0.56, 6.32)) have a higher risk of getting infected as compared to children whose age are greater than 15. Also, children who use untreated water delivered by water tankers (OR = 2.33, 95% CI: (1.00, 4.97)) have a greater risk of harboring schistosomiasis egg counts than those who use water from other sources.
Discussion
Schistosomiasis, an ailment caused by blood flukes of the Schistosoma genus, is a waterborne helminthic disease [36]. It predominantly affects underprivileged rural communities residing in tropical and subtropical areas, where access to safe drinking water and proper sanitation is limited. Infection occurs when individuals come into contact with infested natural freshwater bodies while engaging in activities such as fishing, agriculture, laundry, or swimming [37].
For the analysis of schistosomiasis egg count data, six distinct models applying either the Poisson or negative binomial distribution were taken into account. Various scenarios were considered to simulate the model, however, the focus of the discussions will be on the results when ten predictors and all the sample data was used. The count data was overdispersed and zero-inflated. The standard Poisson and NB performed poorly when fitted to the count data since they both recorded a higher information-theoretic criteria’s value in Table 2 compared to the zero-inflated (ZI) and hurdle models. Further, the ZI and hurdle models were fitted to the count data and by the AIC and BIC fit statistics with the model’s predictions shown in Fig 3 indicated that the ZINB (AIC = 1209, BIC = 1360) and HNB (AIC = 1207, BIC = 1358) model best fitted the data. Table 5 presents each model’s number of zeros captured when five and ten predictors were used against all the sample data and both the standard Poisson (717 zeros captured) and NB (801 zeros captured) failed to capture the number of observed zeros (1265 zeros) with a difference of 548 and 464 respectively when five predictors were used. Additionally, by introducing extra predictors, the Poisson (752 zeros captured) and NB (853 zeros captured) did not the number of observed zeros with a difference of 513 and 412 respectively. On the other hand, the ZI and hurdle models effectively accounted for all 1265 observed zeros when five predictors were employed. By incorporating additional predictors, all ZI and hurdle models successfully captured all zeros, except for the ZINB model which captured 1264 zeros, resulting in a difference of 1. This implies that there is no necessity to introduce additional parameters for capturing the number of observed zeros in the ZI and hurdle models. These findings emphasize the advantages of using ZI and hurdle models over the standard Poisson and NB models for analyzing zero-inflated count data. Moreover, the likelihood ratio test comparing the use of five and ten predictors for the ZINB model (p-value = 0.0711) and HNB model (p-value = 0.0969) does not show significance.
A review study [36] showed that the southern regions in Ghana have a mean schistosomiasis prevalence of 25.8% with a range of 3.3% to 83.9%. In this study, the infection prevalence among participant was 5.9%. We assessed the infection status and intensity in relation to factors such as age, sex, educational level, area of residence, parent’s occupation and their water sources. We found out that the factors linked to schistosomiasis infection include age (10–15 years), area of residence, pre-school education level, and the use of untreated water distributed by tankers. Furthermore, the factors associated with the intensity of schistosomiasis infection (positive counts) include place of residence and water sources such as untreated water from tankers, piped water, and well/borehole water. This confirms that high risk of infection are associated with specific areas and water sources [38–40]. Some research [41, 42] have shown that infection among children is associated with parent’s occupation but the occupation of parents did not show any significant influence on the involvement in schistosomiasis egg counts in this study. The implementation of control strategies is necessary to monitor the water sources, educate adults in rural areas on schistosomiasis infection, and guide parents in preventing their preschool children from accessing contaminated water. However, the results imply that safe drinking and well-treated water must be made available in endemic areas. An integrated approach to control that include sanitation improvement, health education, and less human-water interaction is required for success [43].
Conclusion
The zero-inflated negative binomial (ZINB) and hurdle negative binomial (HNB) models, which are frequently employed to model zero-inflated count data with under-dispersion and over-dispersion, provided the most satisfactory fit among all the models used and performed the best. These models (ZINB and HNB) were used to identify key factors affecting the quantity of schistosomiasis egg counts in children. Factors such as age (10–15 years), residence in rural areas, pre-school educational level, reliance on untreated water sources like well/borehole water, untreated waters from tankers, and piped water were found to be associated with the numbers of positive egg counts. It was observed that the hurdle negative binomial outperformed the zero-inflated negative binomial model when ten set of predictors were used and this is based on the information-theoretic criteria used. Moreover, all zero-inflated models accommodated the excess zeros, recommending it for modelling count data with excess zeros. Extending the modelling approaches to perform geospatial analysis to predict areas of high transmission in order to inform control strategies is recommended.
Supporting information
S1 Fig. A bar relationship between observed and model’s expected frequencies (in log axis).
https://doi.org/10.1371/journal.pone.0304681.s001
(TIF)
S1 Appendix. Data and code availability.
The data, R and python codes for model comparison and analysis can be found at https://github.com/kojonketia/Using-hurdle-models-to-analyze-schistosomiasis-count-data.
https://doi.org/10.1371/journal.pone.0304681.s002
(DOCX)
References
- 1.
Sturrock RF. The Schistosomes and their intermediate hosts. In: Tropical medicine. 2001. p. 7–83. Available from: https://doi.org/10.1142/9781848161511_0002
- 2. Ahmed SH. Schistosomiasis (Bilharzia): Background, Pathophysiology, Etiology. eMedicine. 2023 Mar 23; Available from: https://emedicine.medscape.com/article/228392-overview?form=fpf
- 3. Caffrey CR. Schistosomiasis and its treatment. Future Med Chem. 2015;7(6):675–676. pmid:25996057
- 4. Bosompem KM, Bentum IA, Otchere J, Anyan WK, Brown CA, Osada Y, et al. Infant schistosomiasis in Ghana: a survey in an irrigation community. Trop Med Int Health. 2004 Aug;9(8):917–22. pmid:15303998
- 5. Sumbele IUN, Tabi DB, Teh RN, Njunda AL. Urogenital schistosomiasis burden in school-aged children in Tiko, Cameroon: a cross-sectional study on prevalence, intensity, knowledge and risk factors. Trop Med Health. 2021 Sep 16;49(1):75. pmid:34530935
- 6. Adenowo AF, Oyinloye BE, Ogunyinka BI, Kappo AP. Impact of human schistosomiasis in sub-Saharan Africa. Braz J Infect Dis. 2015 Mar-Apr;19(2):196–205. Epub 2015 Jan 27. pmid:25636189
- 7. Grimes JE, Croll D, Harrison WE, Utzinger J, Freeman MC, Templeton MR. The roles of water, sanitation and hygiene in reducing schistosomiasis: a review. Parasit Vectors. 2015 Mar 13;8:156. pmid:25884172
- 8. Chipeta MG, Ngwira B, Kazembe LN. Analysis of Schistosomiasis haematobium infection prevalence and intensity in Chikhwawa, Malawi: an application of a two part model. PLoS Negl Trop Dis. 2013;7(3):e2131. Epub 2013 Mar 21. pmid:23556017
- 9. Khan A, Ullah S, Nitz J. Statistical modelling of falls count data with excess zeros. Inj Prev. 2011 Aug;17(4):266–70. Epub 2011 Jun 8. pmid:21653652
- 10.
Bhaktha N. Properties of Hurdle Negative Binomial Models for Zero-Inflated and Overdispersed Count data. Doctoral dissertation, Ohio State University. 2018. Available: http://rave.ohiolink.edu/etdc/view?acc_num=osu1543573678017356.
- 11.
Cameron AC, Trivedi PK. Model Specification and Estimation. In: Regression Analysis of Count Data. Cambridge: Cambridge University Press; 2013. p. 21–68. (Econometric Society Monographs).
- 12.
Mohammed. Zero-Inflated Models. Otago University. Available: https://www.otago.ac.nz/ripe/otago301201.pdf.
- 13. Heilbron DC. Zero‐Altered and other Regression Models for Count Data with Added Zeros. Biometrical Journal [Internet]. 1994 Jan 1;36(5):531–47. Available from: https://doi.org/10.1002/bimj.4710360505.
- 14. Perumean-Chaney SE, Morgan C, McDowall D, Aban I. Zero-inflated and overdispersed: what’s one to do? Statistical Computation and Simulation/Journal of Statistical Computation and Simulation [Internet]. 2013 Sep 1;83(9):1671–83. Available from: https://doi.org/10.1080/00949655.2012.668550
- 15. Lambert D. Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing. Technometrics. 1992 Feb 1;34(1):1. Available from: https://doi.org/10.2307/1269547.
- 16. Hu MC, Pavlicova M, Nunes EV. Zero-inflated and hurdle models of count data with extra zeros: examples from an HIV-risk reduction intervention trial. Am J Drug Alcohol Abuse. 2011 Sep;37(5):367–75. pmid:21854279
- 17. Feng CX. A comparison of zero-inflated and hurdle models for modeling zero-inflated count data. J Stat Distrib Appl. 2021;8(1):8. Epub 2021 Jun 24. pmid:34760432
- 18. Obeng BB, Aryeetey YA, de Dood CJ, Amoah AS, Larbi IA, Deelder AM, et al. Application of a circulating-cathodic-antigen (CCA) strip test and real-time PCR, in comparison with microscopy, for the detection of Schistosoma haematobium in urine samples from Ghana. Ann Trop Med Parasitol. 2008 Oct;102(7):625–33. pmid:18817603
- 19. Aryeetey YA, Essien-Baidoo S, Larbi IA, Ahmed K, Amoah AS, Obeng BB, et al. Molecular diagnosis of Schistosoma infections in urine samples of school children in Ghana. Am J Trop Med Hyg. 2013 Jun;88(6):1028–31. Epub 2013 Mar 25. pmid:23530072
- 20. Dassah S, Asiamah GK, Harun V, Appiah-Kubi K, Oduro A, Asoala V, et al. Urogenital schistosomiasis transmission, malaria and anemia among school-age children in Northern Ghana. Heliyon. 2022 Sep 2;8(9):e10440. pmid:36119865
- 21. Angora EK, Boissier J, Menan H, Rey O, Tuo K, Touré AO, et al. Prevalence and Risk Factors for Schistosomiasis among Schoolchildren in two Settings of Côte d’Ivoire. Trop Med Infect Dis. 2019 Jul 23;4(3):110. pmid:31340504
- 22. Nelwan M. Risk factors of schistosomiasis. Social Science Research Network [Internet]. 2020 Jan 1; Available from: https://doi.org/10.2139/ssrn.3722691.
- 23. Agresti A. Categorical data analysis. Wiley series in probability and statistics. 2002. Available from: https://doi.org/10.1002/0471249688
- 24. Liaqat M, Kamal S, Fischer F, Zia N. Zero-inflated and hurdle models with an application to the number of involved axillary lymph nodes in primary breast cancer. Journal of King Saud University—Science. 2022;34(4):101932.
- 25. Baetschmann G, Winkelmann R. A dynamic hurdle model for zero-inflated count data: with an application to health care utilization. Social Science Research Network. 2014;(151).
- 26.
Ridout M, Demétrio CGB, Hinde J. Models for count data with many zeros. International Biometric Conference. Cape Town. 13. 1–13; 1998.
- 27. Vuong QH. Likelihood ratio tests for model selection and Non-Nested hypotheses. Econometrica [Internet]. 1989 Mar 1;57(2):307. Available from: https://doi.org/10.2307/1912557
- 28. Miaou SP. The relationship between truck accidents and geometric design of road sections: Poisson versus negative binomial regressions. Accident Analysis and Prevention. 1994 Aug 1;26(4):471–82. Available from: https://doi.org/10.1016/0001-4575(94)90038-8 pmid:7916855
- 29. Brewer MJ, Butler A, Cooksley SL. The relative performance of AIC, AICC and BIC in the presence of unobserved heterogeneity. Methods in Ecology and Evolution [Internet]. 2016 Jun 1;7(6):679–92. Available from: https://doi.org/10.1111/2041-210x.12541
- 30.
Hilbe JM. Modeling count data. 1st ed., Cambridge University Press, 2014.
- 31. Zhang J, Yang Y, Ding J. Information criteria for model selection. Wiley Interdisciplinary Reviews Computational Statistics. 2023 Feb 20;15(5). Available from: https://doi.org/10.1002/wics.1607.
- 32. Aswi A, Astuti SA, Sudarmin S. Evaluating the performance of Zero-Inflated and Hurdle Poisson models for modeling overdispersion in count data. Inferensi: Jurnal Statistika. 2022;5(1):17.
- 33. Zeileis A, Kleiber C, Jackman S. Regression Models for Count Data in R. J. Stat. Soft. [Internet]. 2008 Jul. 29 [cited 2024 Mar. 26];27(8):1–25. Available from: https://www.jstatsoft.org/index.php/jss/article/view/v027i08.
- 34. Aemero M, Berhe N, Erko B. Status of Schistosoma mansoni prevalence and intensity of infection in geographically apart endemic localities of Ethiopia: a comparison. Ethiop J Health Sci. 2014 Jul;24(3):189–94. pmid:25183924
- 35. Kleiber C, Zeileis A. Visualizing count data regressions using rootograms. The American Statistician. 2016;70(3):296–303.
- 36. Boateng EM, Dvorak J, Ayi I, Chanova M. A literature review of schistosomiasis in Ghana: a reference for bridging the research and control gap. Trans R Soc Trop Med Hyg. 2023 Jun 2;117(6):407–417. pmid:36688317
- 37.
World Health Organization: WHO. WHO launches new guideline for the control and elimination of human schistosomiasis. WHO [Internet]. 2022 Feb 22; Available from: https://www.who.int/news/item/22-02-2022-who-launches-new-guideline-for-the-control-and-elimination-of-human-schistosomiasis.
- 38.
World Health Organization: WHO. Schistosomiasis. WHO [Internet]. 2023. Available from: https://www.who.int/news-room/fact-sheets/detail/schistosomiasis.
- 39. Reitzug F, Ledien J, Chami GF. Associations of water contact frequency, duration, and activities with schistosome infection risk: A systematic review and meta-analysis. PLoS Negl Trop Dis. 2023;17(6):e0011377. Published 2023 Jun 14. pmid:37315020
- 40. Hajissa K, Muhajir AEMA, Eshag HA, et al. Prevalence of schistosomiasis and associated risk factors among school children in Um-Asher Area, Khartoum, Sudan. BMC Res Notes. 2018;11(1):779. Published 2018 Oct 31. pmid:30382901
- 41. Abubakar BM, Abubakar A, Moi IM, et al. Urinary schistosomiasis and associated risk factors among primary school students in the Zaki Local Government area, Bauchi State, Nigeria. Dr Sulaiman Al Habib Medical Journal. 2022;4(4):196–204.
- 42. Jin Y, Cha S, Kim Y, et al. Association Between the Prevalence of Schistosomiasis in Elementary School Students and Their Parental Occupation in Sudan. Korean J Parasitol. 2022;60(1):51–56. pmid:35247955
- 43.
Sokolow S. The History of Schistosomiasis in Ghana. Stanford University. Available: https://schisto.stanford.edu/pdf/Ghana.pdf.