Overdispersion in count data analysis is very common in many practical fields of health sciences. Ignorance of the presence of overdispersion in such data analysis may cause misleading inferences and thus lead to incorrect interpretations of the results. Researchers should account for the consequences of overdispersion and need to select the correct choice of models for the analysis of such data. In this paper, Generalized Linear Models (GLMs) are applied in modelling and analysis of antenatal care (ANC) count data extracted from the Bangladesh Demographic and Health Survey (BDHS) 2014. Pearson chi-square and different score tests are used to investigate the effect of overdispersion in the analysis. Overdispersion is found to be significant in the antenatal health care count data and so appropriate modelling is used to produce valid inferences for the regression parameters. The zero-truncated negative binomial regression (0-NBR) is found to be the best choice for analysing such data while excluding zero counts. Study findings reveal that place of residence, order of birth, exposure to mass media, wealth index and education of mother have significant impacts on the ANC status of women during pregnancy in Bangladesh.
Citation: Hossain Z, Akter R, Sultana N, Kabir E (2020) Modelling zero-truncated overdispersed antenatal health care count data of women in Bangladesh. PLoS ONE 15(1): e0227824. https://doi.org/10.1371/journal.pone.0227824
Editor: Hua Wang, Victoria University, AUSTRALIA
Received: September 16, 2019; Accepted: December 29, 2019; Published: January 14, 2020
Copyright: © 2020 Hossain et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: https://dhsprogram.com/data/available-datasets.cfm.
Funding: RA was funded by the National Science and Technology (NST) Fellowship, Ministry of Science and Technology, Government of Bangladesh under the supervision of ZH.
Competing interests: The authors have declared that no competing interests exist.
In recent years, modelling of count response data is of primary interest in many fields such as public health, epidemiology and demography. Poisson regression (PR) is commonly used as the base or standard model for the analysis of count data. However, PR has a limited number of applications because of the assumption of equidispersion i.e., equality of population mean and variance. In practice, overdispersion is a common phenomenon in count data as the variance is often found to be substantially greater than the mean of counts. Thus, by ignoring the overdispersion in the analysis of count data, the analytical results produce misleading conclusions. In this case, overdispersion should be taken into consideration for modelling count data to avoid misleading inferences about the regression parameters. The negative binomial regression (NBR) is an alternative  to accommodate such overdispersion to the PR model. Moreover, the 0-NBR may be a further improvement for modelling count responses when there are no zero counts, for example, number of ever born children ; number of visits to a medically trained health care provider of women, especially those who received ANC during the whole course of pregnancy to ensure safe birth.
Maternal health care, especially during pregnancy, is a significant issue for the development of a country. Millions of women face major life-threatening complications during pregnancy in developing countries like Bangladesh [3–6]. Although pregnancy and delivery complications are very common among Bangladeshi women, very little attention has been paid by the researchers and relevant authorities to reducing these complications by ensuring adequate ANC visits during their period of pregnancy. Currently, every day approximately 830 maternal deaths occur worldwide due to pregnancy and delivery complications, and unfortunately, developing countries are responsible for 99% of these global maternal deaths . Between 1990 and 2015, the maternal mortality ratio (MMR) per 100,000 live births, has decreased by about 44% worldwide . However, during this time, the scenario has not improved in developing countries. It is observed that the MMR in Bangladesh has now stable in recent years as the estimated MMR is 196 maternal deaths in 2016 which is almost same as it was in 2010 . One target of the sustainable development goals (SDGs) 3 is to reduce the MMR to less than 70 maternal deaths per 100,000 live births by 2030 , which is obviously a challenging issue.
ANC aids women to be ready for the safe delivery and to know the consequences of pregnancy and childbirth linked complications. The interventions of ANC, and particularly the number of interventions also contribute significantly to reducing maternal deaths  and reducing pregnancy and delivery complications . It is recommended by the World Health Organization (WHO) that at least four ANC visits  are necessary for women during their pregnancy under normal circumstances to ensure safe pregnancy and maternal health. However, Bangladesh does not meet the target of this minimum adequate number of ANC visits .
There are several socio-economic and demographic factors related to the number of ANC visits of women during the pregnancy. The age at birth of women is highly significantly associated with the status of ANC visits . There is a higher rate of ANC visits for women from higher socio-economic groups than those from lower socio-economic groups . The impact of birth order is found to be significant and the women are less likely to receive the minimum requirement of ANC visits during the second or higher order births than their first birth . Women residing in rural areas in Bangladesh are less likely to receive at least four ANC visits than women from urban areas . Women’s education is also reported as a potential risk factor for standard ANC visits, antenatal morbidity and maternal health care during pregnancy [5, 6, 18, 19]. A community with lower ANC for women during pregnancy has an impact on whether women stay at home during delivery. As a result there is a high chance for such women of facing pregnancy and delivery complications .
It is evident from BDHS, 2014 that a significant part of Bangladeshi pregnant women does not complete the minimum standard requirement of the number of ANC visits (at least four) during the whole course of pregnancy in order to ensure the safe motherhood. More precisely, 20.7% of women visit only once, 21% visit twice and 17.4% visit three times (Table 1) i.e., in total 59.1% of Bangladeshi women visit less than four times, whereas the standard number of ANC visits is at least four  for ensuring the safe pregnancy and child birth. As a result there is a high likelihood for facing major life-threatening complications during the pregnancy among women who do not complete the minimum standard number of ANC visits than their counterparts. Thus, by ensuring the minimum ANC visits, women can reduce pregnancy complications and hence give safe birth. This consequently can significantly contribute for reducing maternal and child deaths in Bangladesh. It is a huge challenge to ensure that every pregnant woman does complete the minimum standard requirement of ANC visits in Bangladesh. There are very limited studies for determining the influencing factors associated with the increasing trend of ANC visits of pregnant women in Bangladesh. Therefore, in this paper, we aim to investigate the potential determinants for the adequate number of ANC visits of women receiving antenatal care from a medically trained provider in Bangladesh adopting appropriate statistical modelling using the latest secondary data on maternal health care extracted from BDHS, 2014. Overdispersion has been modelled, and tested by statistical methods and found to be significant in the data set used in this paper. For modelling such antenatal health care overdispersed count data, the overdispersed nature of the data has been completely ignored in the previous studies; however, it has been taken into account in the current analysis to avoid misleading results and conclusions.
Materials and methods
Source of data and sampling design
We used the nationally representative survey data from BDHS, 2014 (latest survey with information on maternal antenatal health care during pregnancy). A two-stage stratified random sampling design was used to collect the data. Bangladesh was stratified into 20 sampling strata in the survey. In the first stage, the survey used the proportional probability of sizes to the enumeration areas (EAs) and 600 EAs were selected randomly and independently in each stratum. In the survey, 207 EAs were selected from urban regions and 393 EAs from rural places. A complete list of households was then made in all of these chosen EAs. This list was used as the second stage sampling frame for the selection of households. From this listing of households, 30 households were chosen systematically from each EA with an equal probability of selection in the second stage. Finally, 18000 households were utilized in the survey; 6210 and 11790 households were in urban and rural areas respectively. Information relating to females was collected from women in the age group 15-49 who were or had been married and stayed the night before the survey date in the selected households.
Variables included in the study
In this study, we aim to find the potential determinants for the antenatal health care count visiting status (i.e., number of visits to any ANC health care provider) of women who received ANC during their pregnancy in Bangladesh. To reflect the up-to-date scenario of antenatal health care, information was used from those women had given birth to a live baby in 5 years preceding the survey. From these figures, the data set of 6957 women was established. A number of socio-economic, demographic and community facility allied variables such as region, type of place of residence, birth order, mother’s age at birth, exposure of mass media, educational attainment, wealth index, membership of non-government organizations (NGOs) and women’s empowerment were considered as covariates. Information on all of these variables of interest were not obtained directly from the survey data. The associated covariates found in the data set were combined to compute the new variables: exposure of media, NGO membership and women’s empowerment. The variable ‘exposure of media’ was categorized using the information of women who read newspapers or magazines or listen to radio or watch television at least once per week as being exposed to media. Women are considered to be NGO members if they are associated with any one of the non-government organizations: Grameen Bank, Bangladesh Rural Advancement Committee (BRAC), Bangladesh Rural Development Board (BRDB), Association of Social Advancement(ASA), Proshika, or Mother’s Club. Women who can take the decision independently on at least one of the aspects: personal health care, purchasing main household goods, children’s health care, and independently visiting family or relatives are considered to be empowered. Removing all the missing cases in these particular variables and considering women who took ANC from any medically trained provider, finally the data set consisted of 3478 observations. The ANC count data (mean = 3.53, variance = 5.658 i.e., greater variability) of visiting status to any health care provider of Bangladeshi women during their pregnancy is given in Table 1.
Overdispersion arises when the variance of responses is higher than the mean in a PR model. This is a common phenomenon in a real count data set. The violation of distributional assumptions may also cause overdispersion. Therefore, to avoid misleading inferences and results, overdispersion should be taken into consideration for analyzing the antenatal health care count data of women during pregnancy in Bangladesh. In order to detect the presence of overdispersion in a data set, the value of the Pearson residual chi-square (χ2) statistic [1, 21] divided by the corresponding degrees of freedom (df) is used. This value is known as the dispersion. If the value of dispersion is greater than 1, then the model is overdispersed, the value 1 is for equidispersed and less than 1 for underdispersed model. The Pearson χ2-statistic is defined as where yi and μi represent the observed and expected counts respectively. Moreover, the variance function V is equal to μi and for the PR and NBR models respectively, where k is the dispersion parameter.
Overdispersion in a real count data set is natural. A natural question then may arise whether the amount of overdispersion is statistically significant or not and should be taken into consideration in the analysis of such data. However, statistical tests need to be conducted and this can be done by using the score test. Three versions of the score test statistic are defined by Dean and Lawless (1989) :
We consider three regression models: Poisson, negative binomial (NB) and zero-truncated negative binomial (0-NB) in the context of the GLMs framework [25, 26] for modelling and analysis of antenatal health care count data in Bangladesh. The modelling and analysis of count response data may start with PR regression as the base or standard model. However, count data are often found to be overdispersed in the practical field. This was the case for the data set considered in this study. Therefore, one may consider the NBR as an alternative for further improvement of the modelling and data analysis. In case of the NBR, the multiplicative random effect U is used to model the unobserved variability in the data set. Let Y be the count response variable and then the conditional distribution of Y given an unobserved random variable U is Poisson with mean and variance equal to λu i.e., Y|U ∼ Poisson (λu). For computational simplicity, the random effect U is assumed to follow the gamma distribution with shape parameter δ and scale parameter δ−1 i.e., U ∼ Gamma . Thus the unconditional marginal distribution of Y is the NB distribution with mean E(Y) = μ and variance V(Y) = μ + κμ2 where μ = λ > 0 and the overdispersion parameter κ = δ−1. It follows that the variance is greater than the mean as κ > 0 and consequently the overdispersion in the data is accounted for the NBR in contrast to the PR.
It can also be shown that the Poisson and NB distributions belong to the exponential family. In GLMs using the link function g, the mean function of responses g(μ) is modelled in terms of a set of covariates and regression parameters. In order to construct GLMs for Poisson and NB responses the natural logarithm may be considered as the link function. It permits the comparison of NB with κ = 0 directly with the PR model . Let xi = (xi1, …, xip)⊤ be the p × 1 vector of covariates and β = (β1, …, βp)⊤ represents the vector of regression parameters of order p × 1. Then the form of a GLM may be written as . In the case of the NBR model, we estimate θ = (β, κ)⊤ by maximizing the log-likelihood  function l(θ, y), given by
The response variable, the number of visits to a medically trained health care provider of women who received antenatal care during the pregnancy, excludes zero counts and one may then consider 0-NB distribution. The probability of zero counts, (1 + κμ)κ, is subtracted from the total probability 1 and the required modifications are performed to develop the corresponding theory for 0-NBR. The log-likelihood  can also be given by
The model selection criterion, Akaike information criterion (AIC), is considered to select the best choice of models with smallest AIC value  for the data analysis and is given by where is the vector of estimated model parameters and p is the number of parameters. For interpretation of results it is convenient to use the incidence rate ratio (IRR) in order to investigate the effect of covariates on the count response variable rather than the regression coefficients. The estimated IRR for the individual covariate xj is defined as where is the j-th estimated regression coefficient for j = i, …, p.
This article does not contain any studies with human participants performed by any of the authors. The Bangladesh Demographic and Health Surveys (BDHS) were approved by ICF Macro Institutional Review Board and the National Research Ethics Committee of the Bangladesh Medical Research Council. A written consent about the survey was given to participants before interview. All identification of the respondents was dis-identified before publishing data. The secondary data are used in the current study and freely available on the DHS website (https://dhsprogram.com/data/available-datasets.cfm).
Results and discussion
In this study, we first investigated whether overdispersion was present or not in the antenatal health care count data of Bangladeshi pregnant women. In order to do this, we fit PR model and then computed the dispersion value by using Pearson χ2-statistic. From Table 2, it can be seen that the Pearson dispersion value is 1.394 which clearly indicates the presence of overdispersion in the data and moreover the PR model is overdispersed. We then conducted the score test with PR model to examine the presence of overdispersion whether this is statistically significant or not for the violation of model assumptions. The results are given in the right panel of Table 2.
From p-values (p<0.001) in Table 2, it is evident that overdispersion is significant and the PR model is observed to be overdispersed. It follows that one should consider the alternatives for modelling and analysis of such overdispersed antenatal health care count response data. We then fit NBR and calculate the Pearson dispersion value. From Table 2, it can be seen that this value is 1.056 which is close to 1 and it seems to be that overdispersion has been captured well by the NBR model. However, the count response variable (number of visits to a medically trained provider among women who received antenatal health care) excludes zero visits in this study. Therefore, we consider 0-NBR model as a further improvement for modelling and analysis of the antenatal heath care count data. We then again compute the value of dispersion and this is 1.003. This indicates that the overdispersion in the data set has been modelled and captured very well by 0-NBR as the dispersion value is very close to 1. Finally, we choose the 0-NBR model for the data analysis (smallest AIC = 4.024) in order to investigate the impact of potential determinants on antenatal health care of women during their pregnancy in Bangladesh. The summary results obtained from fitting this model are presented in Table 3.
It is observed from IRR and p-values that the average number of ANC visits of women during pregnancy from Khulna and Rangpur divisions are 1.14 times and 1.22 times more than women who live in Dhaka division and are statistically significant. Although the average number of ANC visits of pregnant women from Barisal, Chittagong, Rajshahi and Sylhet are respectively 4%, 7%, 1% and 7% lower than women in the Dhaka region but they are not statistically significant. The place of residence of women has a highly statistically significant (p<0.001) effect on the number of antenatal health care visits of women during the course of pregnancy. The average antenatal care visits of urban women during pregnancy are 1.19 times more than those of rural women.
The birth order of women is found to be significant in relation to receiving antenatal care from a medically trained provider during the pregnancy. More specifically, the average number of antenatal health care visits is 13% lower during the pregnancy of fourth and later birth (IRR = 0.87) than the duration of the pregnancy of the second and third birth. The mean number of antenatal health care visits of women during pregnancy is 7% (p = 0.050) less for women aged 20 years or below compared to women of age group 20-35 years at the time of delivering their baby. Exposure of media (p = 0.004) is found to be statistically significant for receiving antenatal health care during the duration of their pregnancy. The average number of ANC visits is 11% higher in women who are exposed to media compared with their counterparts (IRR = 1.11). From the reported values of IRRs for different levels of education of women, it is clear that the average number of ANC visits increases significantly (primary: p = 0.043; secondary and higher: p<0.001) with the increasing level of their education. Women who completed primary and secondary levels of education are 1.13 times and 1.28 times higher respectively to receive antenatal care from a medically trained health care provider during the course of pregnancy than women who have no education. On average, the number of ANC visits to a health care provider of highly educated women is 1.61 times higher compared with uneducated women.
It may also be clearly seen that the average number of antenatal care visits of women during pregnancy from a rich family (p<0.001) is 12% more than that of women who belong to middle class families. Furthermore, the average number of antenatal health care visits for the women belonging to the NGO membership group and contributing in taking important decisions 3% higher than for women who are not members of NGOs or not involving in taking any important decision, though both of them are statistically insignificant.
The study results reveal that one should account for the issue of overdispersion in a real count data set to assess precisely the significance of regression parameters. It is also necessary to test whether the overdispersion is significant or not and then one may select the suitable model for the analysis of count response data. We used the latest survey data (BDHS, 2014) in order to investigate the influencing factors associated with antenatal health care of women during the course of pregnancy in Bangladesh. Due to overdispersion in the data, the zero-truncated negative binomial model was chosen for its suitability to estimate the average number of antenatal care visits of Bangladeshi women. The findings from the analysis show that region or division, place of residence, birth order, exposure of media, educational status of women and wealth index have significant effect on the number of antenatal visits to a medically trained health care provider. The significant factors, place of residence, education, wealth index and exposure of media have positive effects while the birth order has negative effects on the antenatal health care visits of Bangladeshi women. More precisely, women from urban areas, belonging to rich families, with a higher level of education, and who are also exposed to media receive more antenatal care by visiting medically trained health providers. Moreover, women at the time of fourth and later birth receive less antenatal care compared to the second and third birth pregnancy in Bangladesh.
Based on the findings in this study, awareness should be created among the women who give birth their first child at the age of below or equal to 20 years so that they can receive more antenatal health care during pregnancy. Moreover, effective attention should be given to motivate women for receiving more antenatal care irrespective of their birth order for a safe birth. Education of women needs to be given top priority and access to media of women also should be facilitated for better antenatal care in Bangladesh. Moreover, basic appropriate maternal health care services must be ensured in rural areas and among the women belonging to poor families in order to reduce the risk of pregnancy complications, and maternal and child morbidity and mortality in Bangladesh.
The authors would like to acknowledge DHS (Demographic and Health Surveys) who made their data available for free and National Science and Technology (NST), Ministry of Science and Technology, Government of Bangladesh for funding this work. We would like to express our gratitude to Department of Statistics, University of Dhaka, Bangladesh and Faculty of Health, Engineering and Sciences (HES) of University of Southern Queensland, Australia for the technical support.
- 1. Hilbe JM. Negative Binomial Regression. Cambridge University Press: 2nd edition.; 2011.
- 2. Kiser H, Hossain MA. Estimation of number of ever born children using zero truncated count model: evidence from Bangladesh Demographic and Health Survey. Health Information Science and Systems. 2019; 7(3): https://doi.org/10.1007/s13755-018-0064-y. pmid:30588293
- 3. Islam M, Sultana N. Risk factors for pregnancy related complications among urban slum and non-slum women in Bangladesh. BMC Pregnancy and Childbirth. 2019; 19(235): https://doi.org/10.1186/s12884-019-2392-6.
- 4. Latif AHMM, Hossain MZ, Islam MA. Model Selection Using Modified Akaike’s Information Criterion: An Application to Maternal Morbidity Data. Austrian Journal of Statistics. 2008; 37(2): 175–184: https://doi.org/10.17713/ajs.v37i2.298.
- 5. Chowdhury RI, Islam MA, Chakraborty N, Akhter HH. Determinants of Antenatal Morbidity: A Multivariate Analysis. World health & population. 2007; 9(3): 9–18: https://doi.org/10.12927/whp.2007.19038.
- 6. Mohammad KA, Zahura FT, Rahman MM. Importance of maternal education on antenatal care visits in Bangladesh. Bangladesh Journal of Scientific Research. 2017; 30(1&2): 23–33.
- 7. WHO. World Health Organisation. Maternal mortality. 2019: https://www.who.int/news-room/fact-sheets/detail/maternal-mortality, Accessed 2019-07-23.
- 8. BMMS. Bangladesh Maternal Mortality and Health Care Survey (2016). National Institute of Population Research and Training (NIPORT), International Centre for Diarrhoeal Disease Research, Bangladesh (icddr,b), and MEASURE Evaluation. 2017. Preliminary Report. Dhaka, Bangladesh, and Chapel Hill, NC, USA: NIPORT, icddr,b, and MEASURE Evaluation.
- 9. WHO. World Health Organisation: Sustainable Development Goals (SDG) 3. Ensure healthy lives and promote wellbeing for all at all ages. 2018: https://www.who.int/sdg/targets/en/, Accessed 2019-07-23.
- 10. Titaley CR, Hunter CL, Heywood P, Dibley MJ. Why don’t some women attend antenatal and postnatal care services?: a qualitative study of community members’ perspectives in Garut, Sukabumi and Ciamis districts of West Java Province, Indonesia. BMC pregnancy and childbirth. 2010; 10(61): http://www.biomedcentral.com/1471-2393/10/61.
- 11. Pandit RD. Role of Antenatal Care in Reducing Maternal Mortality. Asia-Oceania Journal of Obstetrics and Gynaecology. 1992; 18(1): 1–6: https://doi.org/10.1111/j.1447-0756.1992.tb00291.x. pmid:1627055
- 12. WHO. World Health Organisation. Standards for Maternal and Neonatal Care. 2007: https://www.who.int/reproductivehealth/publications/maternal_perinatal_health/a91272/en/. Accessed 2019-07-23.
- 13. BDHS. Bangladesh Demographic and Health Survey (BDHS) 2014. NIPORT, Mitra and Associates; Dhaka Bangladesh, The DHS Program, ICF International; Rockville, Maryland, USA.
- 14. Nisar N, White F. Factors affecting utilization of antenatal care among reproductive age group women (15-49 years) in an urban squatter settlement of Karachi. Journal of Pakistan Medical Association. 2003; 53(2): 47–53: https://ecommons.aku.edu/pakistan_fhs_mc_chs_chs/459.
- 15. Jayaraman A, Chandrasekhar S, Gebreselassie T. Factors Affecting Maternal Health Care Seeking Behavior in Rwanda. DHS Working Papers; USAID. 2008; 59: https://www.dhsprogram.com/pubs/pdf/WP59/WP59.pdf
- 16. Navaneetham K, Dharmalingam A. Utilization of maternal health care services in Southern India. Social Science & Medicine. 2002; 55(10): 1849–1869: https://doi.org/10.1016/S0277-9536(01)00313-6.
- 17. Rahman KMM. Determinants of Maternal Health Care Utilization in Bangladesh. Research Journal of Applied Sciences. 2009; 4(3): 113–119: Determinants of Maternal Health Care Utilization in Bangladesh.
- 18. Becker S, Peters DH, Gray RH, Gultiano C, Black RE. The determinants of use of maternal and child health services in Metro Cebu, the Philippines. Health Transitional Review. 1993; 3(1): 77–89: https://www.ncbi.nlm.nih.gov/pubmed/10148799.
- 19. Chakraborty N, Islam MA, Chowdhury RI, Bari W, Akhter HH. Determinants of the use of maternal health services in rural Bangladesh. Health Promotion International. 2003; 18(4): 327–337: https://doi.org/10.1093/heapro/dag414. pmid:14695364
- 20. Yebyo H, Alemayehu M, Kahsay A. Why Do Women Deliver at Home? Multilevel Modeling of Ethiopian National Demographic and Health Survey Data. PLoS One. 2015; 10(4): https://doi.org/10.1371/journal.pone.0124718. pmid:25874886
- 21. Stroup WW. Generalized Linear Mixed Models: Modern Concepts, Methods and Applications. CRC press: 2012.
- 22. Dean C, Lawless JF. Tests for detecting overdispersion in Poisson regression models. Journal of the American Statistical Association. 1989; 84(406):467–472.
- 23. Cameron AC, Trivedi PK. Regression-based tests for overdispersion in the Poisson model. Journal of Econometrics. 1990; 46(3): 347–364.
- 24. Winkelmann R. Econometric Analysis of Count Data. New York: 5th edition.; 2008.
- 25. McCullagh P, Nelder JA. Generalized Linear Models. London: 2nd edition.; 1989.
- 26. Dobson AJ. An Introduction to Generalized Linear Models. New York: 3rd edition.; 2008.
- 27. Akaike H. Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory. Akademia Kiado Budapest. 1973; 267–281.