Modelling the number of antenatal care visits in Bangladesh to determine the risk factors for reduced antenatal care attendance

The existence of excess zeros in the distribution of antenatal care (ANC) visits in Bangladesh raises the research question of whether there are two separate generating processes in taking ANC and the frequency of ANC. Thus the main objective of this study is to identify a proper count regression model for the number of ANC visits by pregnant women in Bangladesh covering the issues of overdispersion, zero-inflation, and intra-cluster correlation with an additional objective of determining risk factors for ANC use and its frequency. The data have been extracted from the nationally representative 2014 Bangladesh Demographic and Health Survey, where 22% of the total 4493 women did not take any ANC during pregnancy. Since these zero ANC visits can be either structural or sampling zeros, two-part zero-inflated and hurdle regression models are investigated along with the standard one-part count regression models. Correlation among response values has been accounted for by incorporating cluster-specific random effects in the models. The hurdle negative binomial regression model with cluster-specific random intercepts in both the zero and the count part is found to be the best model according to various diagnostic tools including likelihood ratio and uniformity tests. The results show that women who have poor education, live in poor households, have less access to mass media, or belong to the Sylhet and Chittagong regions are less likely to use ANC and also have fewer ANC visits. Additionally, women who live in rural areas, depend on family members’ decisions to take health care, and have unintended pregnancies had fewer ANC visits. The findings recommend taking both cluster-specific random effects and overdispersion and zero-inflation into account in modelling the ANC data of Bangladesh. Moreover, safe motherhood programmes still need to pay particular attention to disadvantaged and vulnerable subgroups of women.


Introduction
Following the third Sustainable Development Goal of reducing the global maternal mortality ratio (MMR) to 70 per 100,000 live births by 2030 [1,2], the Bangladesh government set targets to reduce the MMR to 143 and 105 per 100,000 live births, in 2015 and 2022 respectively. Though Bangladesh has significantly improved the MMR, the improvement stagnated at 196 per 100,000 live births, according to the Bangladesh Maternal Mortality and Health Care Surveys conducted in 2010 and 2016 [3]. Access to maternal health service for all women during their pregnancy period and childbirth is crucial to further reduce pregnancy-related morbidity and mortality. According to the WHO [4], antenatal care visits (ANC) should include at least four visits to medically trained personnel to avoid complications and have a safe delivery. Despite relatively high rates of ANC utilization and improved access to health facilities in Bangladesh, the number of pregnant women with at least this number of four ANC visits remains low (at 31%) and the number of women without any ANC visits remains high (at 22%) [5].
Many studies have assessed the determinants of prenatal care attendance in Bangladesh, particularly focussing on the WHO guideline of taking at least four ANC visits [6][7][8][9], but only a few studies have assessed the determinants of the number of ANC visits in general [10][11][12][13]. It has been shown that there may be two separate processes that generate decisions regarding the use and frequency of prenatal care use [14,15]. Without distinguishing these generating processes, Poisson regression (PR) and negative binomial regression (NBR) models have been widely used to model the number of ANC visits in Bangladesh [12,13,16]. But these models may provide inconsistent regression coefficient, as overdispersion and excess zeros remain unaccounted for [17]. In such situations, zero-inflated and hurdle regression (ZIR and HR) models can be applied [18]. These so-called two-part models have been used in a limited number of studies [10,11]. In addition to the problems of overdispersion and zero-inflation in the ANC data, correlation among measurements (a common phenomenon in longitudinal, repeated surveys, and clustered data) known as intra-cluster correlation (ICC) needs to be considered [19]. The problems associated with ICC can be partially solved by incorporating cluster specific random effects in the standard PR, NBR, HR, and ZIR models [20,21]. Guliani, Sepehri and Serieux [22] employed a two-part HR model incorporating such cluster effects in the model and explored the determinants of ANC use and the frequency of ANC visits utilizing ANC data from 32 developing countries.
To the best of our knowledge, no study has simultaneously considered all the issues discussed above in modelling the number of ANC visits of Bangladeshi women. Thus, the objectives of this study are two-fold: (i) to develop a proper count regression model for the number of ANC visits in Bangladesh covering the issues of overdispersion, zero-inflation, and ICC; and (ii) to determine the risk factors for no ANC use as well as the frequency of ANC visits.

Data description
In this study, data are extracted from the nationally representative 2014 Bangladesh Demographic and Health Survey (BDHS), where the country was stratified into 20 sampling strata according to urban and rural enumeration areas of 7 divisions [5]. A two-stage stratified sampling design was implemented to collect data: in the first stage 600 clusters (393 from rural and 207 from urban areas) were drawn with probability proportional to the enumeration area size and in the second stage 30 households per cluster were selected with an equal probability systematic procedure. The 2014 BDHS covers 17,863 ever-married women aged 15-49 years from 17,300 households. The information on ANC visits was collected from 4493 ever married-women who gave birth in the three years preceding the survey. Among women with two or more live births within the given period, information was only recorded for the last birth. Mothers were asked a number of questions about ANC visits and the received health care during the antenatal visits. The number of ANC visits (a non-negative integer) is the target response variable for which this study aims to identify a proper count regression model. A number of explanatory variables at individual (woman), household, community, and regional levels have been considered based on recent studies on ANC utilization [6,9,13]. The bivariate relationship of the explanatory variables with the number of ANC visits were examined first by a developing simple PR model for each of the explanatory variables. Individual-level explanatory variables in this study include education status of the women and their husbands, women's access to mass media exposure, women's decision-making power on their own healthcare issues, and women's desire of pregnancy along with household wealth status, and place of residence, and regional settings.

Statistical models
Let y ij denote the number of ANC visits of the i th women living in the j th cluster, and the vector X ij the corresponding values of the considered explanatory variables. Assuming independence of ANC visits of the women, the PR and NBR models are defined by: where μ ij is the expected number of ANC visits as a function of explanatory variables, β 0 the overall intercept, and β the vector of regression coefficients. The difference between the PR and the NBR model is in the assumed distribution of y ij in the respective models. In PR, the response variable is assumed to follow a Poisson distribution with E(y ij ) = μ ij = var(y ij ), while in NBR it is assumed to follow a negative binomial distribution with E(y ij ) = μ ij and varðy ij Þ ¼ m ij þ m 2 ij =y, where θ is the shape parameter which controls the dispersion. When θ!1, the NBR model converges to a PR model without overdispersion. Thus, the PR model is a limiting model of the NBR model as the dispersion ðm 2 ij =yÞ approaches zero. The PR and NBR models assume that the observations conditioned on the predictors are independent and identically distributed [23]. However, these assumptions may be violated in clustered data. Ignoring possible correlation in the data result in a model that could lead to biased estimates and misinterpretation of the results [24]. An acceptable way of accommodating this non-independence of observations is to use mixed-effects models, also known as multilevel models. The use of a multilevel modelling strategy accommodates the clustered or hierarchical nature of the BDHS data and corrects standard errors of the estimated coefficients for ICC. A simple mixed-effects PR/NBR model is obtained by incorporating cluster-specific random effects in the standard PR/NBR model: where b oj stand for the random intercepts at cluster level and are assumed to follow a normal distribution with constant variance. The mixed-effects PR and NBR models are referred to hereafter as MPR and MNBR models respectively.
The zero-inflated and hurdle extensions of the PR and NBR models are the most prominent and effective models not only to handle excess zeros in a count data but also to accommodate overdispersion resulting from the variance being greater than the mean [18]. Both ZIR and HR models have a mixture of two generating processes. In the ZIR model, the first process generates only zero counts (structural or genuine zeros) with probability φ ij , while the second process generates non-negative counts (which could result in zeros called sampling zeros) from either a Poisson or a negative binomial distribution with probability (1−φ ij ). Like the ZIR model, the HR models also assume that the first process generates only structural or genuine zeroes while the second process generates truncated positive counts from a zero-truncated Poisson or negative binomial distribution [25]. In relation to ANC visits, structural zeros occur if a pregnant woman would never visit and sampling zeros occur if she could visit but has no reason to do so within the specified time frame. In this study, women reported the ANC visit for their last birth over the three years preceding the survey. Accordingly, the reported zeros could be structural, but they could also be sampling errors due to incorrect recall or misspecification of the time frame. In the ZIR model, the distribution of the number of ANC visits is modeled as: and where f ij (.)is either a Poisson or a negative binomial distribution.
The basic difference between the two models is that the ZIR model uses two distribution for the zero counts, while the HR model use one distribution for the zero counts: The first so-called zero part of the process can be modelled as a binary or logit model. According to the modelling distribution (Poisson or negative binomial) of the second so-called count part of the process, the ZIR and HR models are referred to as either ZIPR/HPR or ZINBR/ HNBR. In general, separate explanatory variables can be used in the two parts of the ZIR and HR models. To explain these models mathematically, let X ij and Z ij be vectors of known explanatory variables used in the zero-and count-part models respectively. Then, the zero and count part models under the simple ZIR and HR models can be expressed, respectively, as: where γ 0 is the overall intercept, and γ is the vector of regression coefficients of the binary process (binary logistic model). The HR model separates the structural zeros from the non-zero responses by modelling non-zero counts with a truncated Poisson/negative binomial distribution. Consequently, the effects of covariates on φ ij in the HR model (on the log-odds of a structural zero) and their effects on φ ij in ZIR model (on the log-odds of a structural and sampling zero) are not equivalent [26,27]. The mixture of two parts in a ZIR/HR model allows to interpret separate answers of the two questions (i) which factors influence whether a pregnant woman will attend ANC or not and (ii) which factors predict the number of times she will take ANC. Moreover, explanatory variables may have different impacts in the two processes.
The mixed-effects ZIR and HR models can be expressed by adding a cluster-specific random component b 0j to β 0 in the count part and another cluster-specific random component c 0j to γ 0 in the zero part: When cluster-specific random effects are considered only in the count part (so c 0j = 0), the mixed-effects ZIR and HR models are denoted by MZIPR/MZINBR and MHPR/MHNBR respectively depending on the assumed distribution (Poisson or negative binomial) of the count part. Models with also a random effect in the zero part (usually it is considered as an extra random effect in the mixed-effects ZIR/HR model) are denoted hereafter as MZINBR. ERE (for example) for the MZINBR model. The two model processes or their log-likelihoods are assumed functionally independent, so the joint likelihood can be maximized by maximizing each part separately [26]. A maximum likelihood method approximating the integrals over the random effects with an adaptive Gaussian quadrature rule [28] was used to fit the mixed-effects ZIR and HR models. Several Rpackages were used to analyse different versions of the PR and NBR models. The recently developed "GLMMadaptive" package of Rizopoulos [29] was employed to fit mixed-effects ZIR and HR models. Significance of the dispersion parameter, zero-inflation, and goodnessof-fit of the model (H 0 : fitted model suits well for the data) which is further referred to as the uniformity test were assessed using the residual diagnostics for hierarchical (multi-level/ mixed) regression models available in the DHARMa (Diagnostics for Hierarchical Regression Models) package of Hartig [30].
Since the considered models are based on different assumptions, their direct comparison is complicated. A step by step comparison procedure is followed providing priority to the uniformity test and significance of the cluster-specific random effects to select the final model. The basic steps are: Step 1: First PR and NBR models with cluster-specific random intercept (MPR and MNBR) are examined assessing whether overdispersion and zero-inflation issues are covered by the fitted models.
Step 2: If zero-inflation remains, ZIR and HR models with (MZIPR, MZINBR, MHPR, MHNBR) or without (ZIPR, ZINBR, HPR, HNBR) cluster-specific random intercept are estimated and compared. Mixed-effects ZIR and HR models are developed considering cluster-specific random intercept only in the count part (say, MZINBR) as well as in both the count and the zero parts (say, MZINBR.ERE). Nested and non-nested models are compared using likelihood ratio (LR) and Vuong tests [31] respectively.
Step 3: The final model is selected using the DHARMa's uniformity test, assessing which model fits better for the data. Since mixed-effects ZIR and HR models are not directly comparable, the uniformity test is used to examine whether MZINBR or MHNBR (with or without extra cluster-specific random effects) suits the studied data better. Significance of the cluster variance component in the zero and count parts is assessed using the LR test.

Results
The distribution of the number of ANC visits shown in Fig 1 is positively skewed with low mean (2.75) and median (2.0) number of ANC visits. About 22% of the pregnant women did not take any ANC visits and only 31% took ANC at least 4 times during their pregnancy period. Table 1 shows mean and median numbers of ANC visits according to different background characteristics of the women. Both the mean and the median frequency significantly vary with all these characteristics. Women from the Khulna division had a higher mean (3.42) and median (3) number of ANC visits and those from the Sylhet division (2.02 and 1 respectively) had the lowest. As expected, urban women have a higher mean and median number of ANC visits than the rural women. Women's education and exposure to mass media (TV), their husbands' education and the household wealth status showed a significant positive association with the mean and median number of ANC visits. Women who wanted their pregnancy had a higher mean and median number of ANC visits than those with an unwanted pregnancy. Women's decision-making power on their own healthcare issues showed significant association with the mean and median number of ANC visits.

Model selection
Policy makers, stakeholders, and donors explore risk factors for reduced prenatal care use or lower frequency of visits to design strategies to improve maternal health care. A proper count regression model for the number of ANC visits incorporating multiple factors helps to identify those core risk factors. In this study, one-part regression models (such as PR and NBR) and two-part regression models (such as ZIR or HR) with and without consideration of ICC are compared to examine whether there are indeed two generating processes in the number of ANC visits as well as to determine the risk factors of the processes. A fixed set of explanatory variables was used in all models of ANC visits for comparison purposes. The comparison of the standard PR and NBR models with their mixed-effects models MPR and MNBR shown in Table 2 indicates that PR and MPR models fail to capture the overdispersion, while both NBR and MNBR do account for overdispersion, but all four models are unable to account for the issue of zero-inflation. Among these models, the NBR model is preferred by the DHARMa uniformity test with lower p-value (0.066). However the AIC, log- likelihood, and LR test indicate that inclusion of random intercepts are required for the studied ANC data. Thus, the ZIR and HR models without and with random intercepts were developed. The results of Vuong tests for non-nested models shown in Table 3 indicate that either ZINBR or HNBR can be considered to be the better model to account for excess zeros. Table 3 also reflects that the overdispersion is captured better by the NBR-based models than the PRbased models. The results of the LR tests for nested models shown in Table 4 indicate that cluster-specific random effects should be considered in the NBR-based models. Random effects are also found important for both the count-and the zero-part models in both cases of the ZINBR (MZINBR and MZINBR.ERE) and the HNBR (MHNBR and MHNBR.ERE) models.
Since there are four possible candidates to be the best model for the ANC data, the DHAR-Ma's uniformity test was performed to find the best suitable model among these. Table 5 shows that the ZINBR model with random intercepts in the count-part (MZINBR) confirms uniformity (p-value = 0.283) with the observed count data, but the LR test in Table 4 shows that this model still requires random intercepts in the zero part (MZINBR.ERE) (pvalue < 0.001). However, the MZINBR.ERE failed the uniformity test (p-value = 0.012). On the other hand, HNBR with random intercepts in the count part (MHNBR) and HNBR with random intercepts at both the count and the zero parts (MHNBR.ERE) passed the uniformity test (p-value = 0.549 and 0.118). Thus, MHNBR.ERE is considered to be the best model among the possible candidate models for the ANC data of Bangladesh. Also, the MHNBR and MHNBR.ERE models provide lower cluster-specific variance components (as well as lower ICC) than the MZINBR and MZINBR.ERE models. Do note that the same set of explanatory variables are maintained in all the one-part models and two-part models for comparison purposes. Informal diagnoses of the cluster-specific residuals through Q-Q and distribution plots of the standardized residuals shown in Fig 2 confirm that the cluster-specific residuals obtained from both the count and the zero parts are normally distributed with constant variance.

Risk factors
According to the selected HR model with random effects in both the count and the zero parts (the MHNBR.ERE model), division, place of residence, household wealth, women's media exposure, women's and their partner's education status, women's decision-making power on their own healthcare issues, and desire for pregnancy have highly significant effects on either zero prenatal care use or the frequency of prenatal care use (Fig 3 and Table 6). The count-part model shows the effects of the considered factors on the frequency of ANC visits represented as incidence rate ratio (IRR), while the zero-part model shows the effects of the considered factors on the women's decision to take no ANC represented as odds ratio (OR). The estimated IRR and OR with their 95% CI are in red and blue respectively in Fig 3. Since both parts have cluster-specific random effects, the estimated parameters represent the effects of individual-, household-, regional-, and community-level characteristics on ANC attendance and the frequency of ANC visits after controlling for the unobserved community level factors. It is noted that regression coefficients of other models are not presented here, only their summary statistics were reported for comparison purposes.
The results of the finally selected count-part model shown in Fig 3 and Table 6  The estimated variance components in the count part (s 2 c = 0.069) and the zero part (s 2 Z = 0.626) indicate significant community-level variation in the number of ANC visits, due to between-cluster heterogeneity.

Discussion
Aim of this study was to identify an appropriate count regression model for the number of ANC visits among pregnant women in Bangladesh utilizing recent nationally representative survey data. Since a substantial proportion of women did not take any prenatal care and the women are clustered according to the survey design, the performance of the standard Poisson and negative-binomial regression models have been compared with their zero-inflated and hurdle models with and without consideration of ICC in the model selection process.
The study has followed a systematic procedure to select the most appropriate count regression model for the frequency of ANC visits by examining a variety of criteria, particularly the existence of zero-inflation and community effects in the responses. It is found that the zero ANC visits are generated from two different processes and hence either zero-inflated or hurdle regression model should be used to model the frequency of ANC visits in Bangladesh. Since the household surveys in Bangladesh use a complex cluster sampling design, regression models should also incorporate correlation (unless the considered explanatory variables in the model could explain the cluster-level variability), to prevent biased estimates with unfortunate undercoverage due to lower standard errors [32]. In this study, the incorporation of the ICC along with the help of uniformity tests facilitated the selection of the mixed-effect hurdle model as the appropriate model for the considered data.

Fig 3. Estimated incidence rate ratio (IRR) of having ANC visits (red dot and confidence line) and odds ratio (OR) of not attending any ANC visit (blue dot and confidence line) with 95% confidence interval (CI) from the hurdle negative binomial regression with random intercept at both count-and zero-part (MHNBR. ERE) models.
https://doi.org/10.1371/journal.pone.0228215.g003 Based on the selected hurdle regression model, women living in the Khulna and Rangpur divisions had a significantly lower probability of attending no antenatal care, compared to those living in the Sylhet and Chittagong divisions, while women from the Khulna and Rangpur divisions also had significantly higher frequency of ANC visits compared to women from the Sylhet and Chittagong divisions. These findings are highly supported by the findings obtained in a very similar study on ANC visits by Rahman et al. [33]. The findings may suggest that large-scale maternal and neonatal health programs worked properly in the economically poor Khulna and Rangpur regions compared to the economically rich Chittagong and Sylhet regions [34]. Another explanation could be worse access to maternal health services for the women who live in the remote hill-tract areas of the Chittagong division [35] and in the haor areas (a wetland ecosystem in the north-eastern part of Bangladesh) of the Sylhet division. Guliani, Sepehri and Serieux [22] showed that women living in urban settings are more likely to attend prenatal care and have a higher frequency of visits compared to their counterparts, based on ANC data of 32 developing countries including Bangladesh. The results from the present study also indicate that women residing in urban areas have a higher frequency of ANC visits than those in rural areas. However, in our multivariate analysis, place of residence did not have a statistically significant influence on the woman's attitude to use ANC during pregnancy. The difference between frequency and use by urban-rural settings may arise mainly from the attitude of married adolescent women living in rural areas, who are less likely to use skilled maternal health services than those residing in urban areas [36][37][38]. The higher IRR for urban areas supports the idea that availability of health care centres has increased the access to maternal health services for urban women compared to the rural women. Moreover, women living in urban areas are relatively more educated, are more aware of health, and have more decision-making power on their own healthcare issues compared to women living in rural areas.

Factors Category Count-part (Number of ANC visits) Zero-part (No ANC attendance)
A positive association between the ANC utilization and the household wealth status has been found in many studies on ANC use [22,33,39]. This positive association does not vary by urban-rural settings [40]. The estimated OR and IRR in this study indicate that the probability to take no ANC and the frequency of ANC visits both increased with increasing household wealth status. A possible explanation could be that women who belong to well-off families usually have proper education, access to mass media, and an ability to spend more money to take frequent ANC visits compared to women from poorer families.
The findings of this study showed that women who have access to mass media at least once a week are less likely to keep away from ANC visits and have more ANC visits. Some studies on ANC utilization support this finding, particularly for women living in rural [39] and slum areas of big cities like Rajshahi [41] and Dhaka [42]. Mass media broadcast different sorts of health-related programs and news that make women aware of their well-being and the wellbeing of their unborn baby.
The likelihood of prenatal care attendance and the frequency of ANC use are both positively associated with the level of women's education and the influence of education is more pronounced for seeking prenatal care than the number of ANC visits [22]. The current study also found that the level of education had a stronger impact than other factors on both the use and frequency of ANC visits. Educated women took more ANC since they have more knowledge of the benefits of frequent ANC visits such as a reduction of pregnancy complications, ensuring safe delivery, and supporting healthy life of the babies. Moreover, they are more knowledgeable about how to find health care.
The findings of this study show that the partner's education also contributes to deciding whether a woman will take ANC, rather than the frequency of ANC visits. The probability of avoiding ANC significantly decreased with an increase of the partner's education status. Rahman, Islam and Islam [41] also found that the husband's education has a significant influence on taking prenatal care. The findings suggest that educated partners may be more concerned with their pregnant wives and the associated pregnancy complications.
The desire of pregnancy has a significant influence on the number of ANC visits in this study, rather than on the decision to seek ANC. Rahman et al. [33] also found that women are more likely to seek care for pregnancy complications when they intended to have the pregnancy. Conversely, when women are unwilling and unhappy about untimely pregnancy, they may be more likely to hide it and less likely to take frequent ANC visits. Hiding behaviour is common among women who live in a more conservative rural environment.
Women empowerment in health care decision-making is also found to be significantly associated with the number of ANC visits rather than seeking ANC. Women who can take decisions by themselves take ANC visits more frequently than their counterparts do. A possible reason behind this finding could be that educated woman living in urban areas (who usually take decision by themselves) are more conscious about their own and their unborn babies health compared to illiterate women who depend on other's decision to seek prenatal care. Hossain and Hoque [11] also found a significant positive influence of women empowerment (measured by education, freedom of choice/movement, household decision making power, and economic activities) on the decision and intensity of utilization of antenatal care in Bangladesh.

Conclusion
The selected hurdle regression model confirms that two processes generate the number of ANC visits in Bangladesh: one process generates zero ANC visits and the other generates the frequency of ANC visits. The significance of the cluster-specific variance component at both the zero and the count part of the hurdle regression model indicates that the community (cluster) has a significant effect on the variation of both the women's decision on prenatal care use and the frequency of ANC visits, although most of the variations originated from women-, household-, and regional-level factors. The findings of the study thus show the necessity of considering community effects (ICC) along with overdispersion and zero-inflation in modelling the ANC data of Bangladeshi women and hence also in identifying risk factors for not attending any ANC as well as for the frequency ANC visits. Though only random intercept models have been investigated in this study, further investigations can be performed to assess the relevance of random slopes in the model. Also, clustering at higher administrative units (such as district and sub-district) can be investigated using three-or four-level models [43]. Moreover, we found that hurdle and zero-inflated type models should be selected carefully since poorer assumptions of one type of (structural) zeros are difficult to derive from real world data. It is better to select the model structure statistically (whether the fitted model can explain all the zeros) rather than based on types of zeros.
The findings of this study might help policy makers to find out which socio-economic and demographic groups should be given priority to encourage women to attend ANC and to have more ANC visits to medically trained personnel during their pregnancy period. This study also suggests that besides improving women's academic education and household wealth, women should be motivated to change their attitude to seek medical care during their pregnancy. The significant cluster-level variation in the developed model also indicates that the goal of reducing maternal death could be achieved if heterogeneity in the prenatal care use and its frequency could be reduced at the community level.