Modelling chronic malnutrition in Zambia: A Bayesian distributional regression approach

Background The burden of child under-nutrition still remains a global challenge, with greater severity being faced by low- and middle-income countries, despite the strategies in the Sustainable Development Goals (SDGs). Globally, malnutrition is the one of the most important risk factors associated with illness and death, affecting hundreds of millions of pregnant women and young children. Sub-Saharan Africa is one of the regions in the world struggling with the burden of chronic malnutrition. The 2018 Zambia Demographic and Health Survey (ZDHS) report estimated that 35% of the children under five years of age are stunted. The objective of this study was to analyse the distribution, and associated factors of stunting in Zambia. Methods We analysed the relationships between socio-economic, and remote sensed characteristics and anthropometric outcomes in under five children, using Bayesian distributional regression. Georeferenced data was available for 25,852 children from two waves of the ZDHS, 31% observation were from the 2007 and 69% were from the 2013/14. We assessed the linear, non-linear and spatial effects of covariates on the height-for-age z-score. Results Stunting decreased between 2007 and 2013/14 from a mean z-score of 1.59 (credible interval (CI): -1.63; -1.55) to -1.47 (CI: -1.49; -1.44). We found a strong non-linear relationship for the education of the mother and the wealth of the household on the height-for-age z-score. Moreover, increasing levels of maternal education above the eighth grade were associated with a reduced variation of stunting. Our study finds that remote sensed covariates alone explain little of the variation of the height-for-age z-score, which highlights the importance to collect socio-economic characteristics, and to control for socio-economic characteristics of the individual and the household. Conclusions While stunting still remains unacceptably high in Zambia with remarkable regional inequalities, the decline is lagging behind goal two of the SDGs. This emphasises the need for policies that help to reduce the share of chronic malnourished children within Zambia.


Introduction
The burden of child malnutrition still remains a global challenge, with greater severity being faced by low-and middle-income countries [1][2][3]. Globally, malnutrition is amongst the most important risk factors associated with illness and death, affecting hundreds of millions of pregnant women and young children [3][4][5][6]. Stunting in early childhood is strongly associated with numerous short-term and long-term consequences, including increased childhood morbidity and mortality, delayed growth and motor development and long-term educational and economic consequences later in life [7]. Undernourishment causes children to start life at mentally suboptimal levels [8].
Assessment of childhood malnutrition commonly relies on standard anthropometric measures for insufficient height-for-age (stunting) indicating chronic undernutrition, insufficient weight-for-height (wasting), indicating acute undernutrition; and insufficient weight-for-age (underweight), an indicator commonly used to asses, both, chronic, and acute undernutrition [9,10].
Anthropometric measurements are practical techniques for assessing children's growth patterns during the first years of life. The measurements also provide useful insights into the nutrition and health situation of entire population groups. Anthropometric indicators are less accurate than clinical and biochemical techniques in assessing individual nutritional status. However in resources limited settings, the measurements are a useful screening tool to identify individuals at risk of undernutrition, who can later be referred to subsequent possible confirmatory investigation [11].
It is estimated that globally 52 million children under-five years of age are wasted, 17 million are severely wasted and 155 million are stunted. Around 45% of deaths among children under-five years of age, most of which occur in the sub-Saharan Africa are linked to undernutrition [3,9]. It is also estimated that four out of ten children under the age of five in Zambia are stunted [12]. This paper will therefore focus on childhood stunting in Zambia.
Global prevalence of stunting in children younger than five years declined during the past two decades, but still remain unacceptably high in South Asia and sub-Saharan Africa regions [5]. If current trends remain unchecked, projections indicate that 127 million children under five years of age will be stunted in 2025 [1]. There is therefore need to heighten various interventions in these affected region and to investigate possible area specific determinants of stunting.
There are already fairly well documented perspectives on determinants of malnutrition. The treatise on these determinants mainly relies on the United Nations Children's Fund (UNI-CEF) conceptual framework on malnutrition which has evolved over time as more knowledge and evidence on the causes, consequences and impacts of undernutrition is generated. The framework distinguishes between immediate, intermediate and underlying determinants of malnutrition [5,[13][14][15].
The immediate causes of undernutrition include inadequate dietary intake and disease, while the underlying causes could include household food insecurity, inadequate care and feeding practices for children, unhealthy household and surrounding environments, and inaccessible and often inadequate health care. Basic causes of poor nutrition encompasses the societal structures and processes that neglect human rights and perpetuate poverty, constraints faced by populations to essential resources [13].
Several studies done within sub-Saharan Africa investigated determinants such as the mother's level of education, income levels and these factors have been linked to malnutrition [9,12,16,17]. The source of the drinking water, the wealth of the household, the area of residence, age of the child, the sex of the child, the breastfeeding duration, the age of the mother has also been investigated and were observed to be significant correlates of stunting [12,18]. Within Zambia, stunting was observed to be more likely among children of less educated mothers (45%) and those from the poorest households (47%) [19]. The determinates of malnutrition are related to each other and the differences and direction between these levels of determinism as indicated in the UNICEF framework are often not discrete but in reality related. As discussed by Kandala [17] for example, the mother's level of education might be influencing child care practises-an intermediate determinant-and the resources available to the household-an underlying determinant.
Previous studies elsewhere have observed that stunting tends to show regional variation [4,9,16]. We see this trend in Zambia as well, where the decline of stunting has been only gradual and unacceptable, with higher prevalence in Northern province where 50% of the children being stunted, and stunting being less common in Lusaka, Copperbelt, and Western provinces where 36% of children are stunted [19]. We see this regional variation of stunting in Fig 1  which shows stunting in Zambia in the 2007 and 2013/14 waves of the Zambian Demographic and Health Surveys (ZDHS). The ZDHS is a national-wide survey which is representative at a sub-national level and contains information on trends in fertility, childhood mortality, use of family planning methods, and maternal and child health indicators including HIV and AIDS [19]. The figure shows the height-for-age z-score, with Western province better than Northern province for the 2007 wave. We see a slight difference in 2013/14 as stunting seemed to get worse in parts of the Western province.
Much of the work done on the determinants of stunting in Zambia, have considered socioeconomic characteristics and have assessed the linear effects of these determinants on the conditional mean [12,20], using models specifications such as; linear models, generalized linear models (GLMs) and generalized additive models (GAMs) [21]. These aproaches are useful and have the advantage of being easy to estimate and to interpret. However, they may risk model misspecification and draw inaccurate estimates, when heterogeneity is present, or when extreme values in the response are present and when a linear relationship is not plausible. In the analysis of certain outcomes, like stunting for example, the interest is not only in the conditional mean, but also in extreme values (height-for-age z-scores), or other parameters of the response. Quantile regression is one possibility to model beyond the conditional mean, with the interest to show variation of the outcome at a quantile level, without making any assumptions of the response distribution. This method for example has been applied in child malnutrition studies [16]. However, distributional regression offers advantage over quantile regression, as it provides the possibility to characterize the complete probabilistic distribution of the response in one joint model [21,22]. Moreover, distributional regression is more efficient, if prior knowledge on specific aspects of the response distribution is available, or can be estimated [23]. Furthermore, the characterization of the whole distribution of the response is more informative.
The study by Kandala [24] which focused on stunting in sub-Saharan countries found that there are distinct spatial patterns of malnutrition that are not explained by the socio-economic determinants or other well-known correlates alone [16,25]. As such, our study includes spatial covariates, since we aim at investigating spatial differences of stunting in Zambia at sub-district level while jointly analysing socio-economic and environmental characteristics.
The following three reasons make our study novel compared to previous work [18]. Firstly, we jointly analysed remote sensed data and socio-economic covariates at sub-national level. This is made possible due to availability of georeferenced data at the primary sampling unit a household pertains to in the recent two ZDHS datasets. The Demographic and Health Surveys rely in most cases on a two-stage survey, and the primary sampling units corresponds to the enumeration areas from the most recent completed census that has been selected. Georeferenced data is important as it generates more specific information which can facilitate targeted interventions. Secondly, we used Bayesian distributional regression which allows us to model all parameters of the underlying response distribution. Lastly, we used two waves of the demographic health surveys to control for spatio temporal interactions. Therefore, this study demonstrates small area variation in stunting in Zambia and analyse possible inequalities and deprivation at the sub-district level.

Socio-economic and georeferenced covariates
We used data from the 2007 and 2013/14 ZDHS. The ZDHS is a national-wide survey which is representative at a sub-national level and contains information on trends in fertility, childhood mortality, use of family planning methods, and maternal and child health indicators including HIV and AIDS. For these population health indicators, data is collected for women aged 15-49, men aged 15-59 and children below five years of age [19].
The ZDHS provide besides information on the district a household pertains to, also information about the geolocation of the primary sampling unit a household belongs to, and from which the data was collected. The location of the primary sampling unit is the spatial information used in the empirical analysis. During data processing, GPS coordinates are displaced to ensure that respondent confidentiality is maintained. The displacement is randomly applied so that rural points contain a minimum of 0 and a maximum of 5 km of positional error. Urban points contain a minimum of 0 and a maximum of 2 km of error. A further 1% of the rural sample points are offset a minimum of 0 and a maximum of 10 km [26].
Demographic Health Surveys have documented weakness for estimation of individual anthropometric measurements. Potential threats to high data quality may occur across various research stages, from survey design to data analysis. There is also often a substantial amount of missing or implausible anthropometric data across surveys [27].
Furthermore, there is caution over the use of stunting as an individual classifier in epidemiologic research or its interpretation as a clinically meaningful health outcome. Stunting should be used as originally designed to be from its original use as a population level indicator of community well-being [28], as it reflects past health and nutrition conditions; and an indication of socio-economic development of a country [1].
Despite the above highlighted limitations of DHS and anthropometric indicators, they remain useful national wide measurements that can be used to estimate child health. Moreover, in general anthropometric measures are a good indicator for planning as they can provide a lot of information to policy makers to answer, how, where and which type of intervention would be favourable in specific settings.

Socio-economic and spatial determinants
The effects of socio-economic factors, such as the education of the mother, household size, wealth of the household on the health status of children are well documented [12,29]. We calculated an index representing the wealth of the household based on the household's assets using Principal Components Analysis (PCA) following Filmer and Pritchett, and Sahn [30,31]. Previous studies have shown that household wealth status was a predictor of childhood malnutrition. Children from poor households are more likely to be stunted than those from richer households [29].
In our analysis we investigated the impact of different socio-economic factors, which impact on height-for age Z-score has been discussed in literature. Table 1 gives an overview and the according references.

Remote sensed covariates
We obtained remote sensed data on drought severity, malaria incidence, and population density. The description, and source to these data sets is provided in Table 2.
For example, the malaria incidence data was obtained from the Malaria Atlas Project (MAP). The project collects malaria data on malaria cases reported by surveillance systems, nationally representative cross-sectional surveys of parasite rate, and satellite imagery capturing global environmental conditions that influence malaria transmission [35].

Methodology
We assessed the relationships between socio-economic and remote sensed characteristics and anthropometric outcomes using the Bayesian Distributional Regression (BDR). BDR models all parameters of the response distribution based on structured additive predictors and allows to incorporate for example, non-linear effects of metric covariates, spatial effects, or varying effects. Applications of structured additive regression models to topics in Global Public Health are found in several publications [39][40][41][42]. This approach permits us to fully analyse the whole distribution [41,43] and our analysis was not restricted to assessing the conditional mean of the height-for-age z-score. Instead suspected heterogeneity across socio-economic and georeferenced factors and the anthropometric measure can be directly captured. In the context of growth failures this is of particular importance, as previous studies highlighted high levels of heterogeneity related to growth failures [33].

Bayesian distributional regression
Relying on Bayesian distributional regression requires to specify the distribution of the response variable. Assuming the response distribution to be Gaussian permits to model besides the conditional mean also the variance or standard deviation of the response variable. Graphical analysis using amongst others randomised quantile residuals [44] strengthens that a Gaussian model is plausible. See also Fig 2, for more details. In the left-hand panel of Fig 2 the histogram of the height-for-age z-score together with the underlying density illustrates why the normal distribution seems to be an appropriate choice. This is further confirmed in the second and third panel, where the histogram of the quantile residuals including the underlying kernel density estimate, respectively, the QQ-plot of the randomised quantile residuals are shown. Table 1. Included covariates, their source, and effect on the height-for-age z-score found in the literature.

Covariate
Used data source Effect on stunting found in literature Reference Asset Index DHS Household wealth inequality associated with childhood stunting [29] Age mother at birth DHS Increasing non-linearly [16] Age child DHS Decreasing non-linearly [17] Birth order DHS Being born forth or higher significantly more stunted [16] Breastfeeding duration DHS Breastfeeding interval � associated with low level of stunting [25] Education mother DHS Stunting improves non-linearly with the educational level [32] Household size DHS Increases linearly [12] Mothers' BMI DHS U-shape relationship with childhood stunting [17] Number of vaccinations DHS Lower levels of stunting when fully vaccinated [25] Drought severity index See Table 2 Not further specified [33] Malaria incidence See Table 2 No clear pattern [34] Population density See Table 2 Not further specified [33] https://doi.org/10.1371/journal.pone.0255073.t001 Assuming the response distribution of the height-for-age z-score to be Gaussian, both the mean μ and the standard deviation σ are related to a structured additive predictor. Accordingly, the z À score is � N ðm is ; s i Þ can be specified as Gaussian response, with i = 1, . . ., I being the number of children, and s = 1, . . ., S be the location of the primary sampling unit the child pertains to. Using the same notation as in the methodology manual of BayesX the regression model can be written as follows [45]: Here both parameters of the normal distribution the mean μ and the standard deviation σ are related to the set of covariates specified further in Tables 1 and 2. Accordingly, the response functions h μ , and h σ link the two parameters to their structured additive predictors which is specified as follows:

PLOS ONE
where f 1 (�) to f 10 (�) are potential non-linear effects of socio-economic and remote sensed covariates approximated using Bayesian penalised Splines (P-Splines) first described by Lang and Brezger [46]. Bayesian P-Splines are based on P-Splines as introduced by Eilers and Marx [47], and use an approach based on basis functions. As smoothness priors of the unknown regression parameters β j a random walk prior of the form β j jg 2 j / exp À 1 is specified, with g 2 j being random variance parameters and K j is an appropriate penalty matrix, see also Lang and Brezger [46] for an elaborate discussion. Relying on Bayesian P-Splines allows to incorporate different model terms such as non-linear effects for continuous variables, varying coefficients [48], or spatial effect. In addition, Bayesian P-Splines allow to decompose the predictor additively and are known for having good mixing properties. See S1 to S8 Figs in the Supporting Information for convergence diagnosctics of the sampling paths of the parameters included in the final model that is based on a Markov chain Monte Carlo (MCMC) algorithm. f 11 (�) is the spatio-temporal effect included in the model to account for unexplained heterogeneity by incorporating a Markov random field prior [46]. In more detail, following Lang and Brezger [46], the basic Markov random field prior for the regression coefficients β s of the spatially correlated effect f s is defined as follows: Where N s is the number of neighbouring sites of location s, s 0 belongs to the set of neighbouring sites δ s of location s, and t 2 s being the spatially adaptive variance parameteres [47]. Incorporating the spatial effect on a Markov random field proposal allows to account for the remaining heterogeneity that is not explained by the included covariates. See also Chapter 4 of the methodology manual of BayesX [45], and Seiler and colleagues [42] for a more elaborate discussion on the incorporation of the spatio-temporal effect based on a Markov random field prior. This variables are further specified in Table 1. x 0 β subsumes the vector of included effect coded categorical covariates that are the gender, the place of living, and the survey wave. For an illustration of effect coding see for example Fahrmeier and colleagues [49]. From a Bayesian perspective the categorical variables are considered to be random variables for which a diffuse prior of the form p(γ j ) / const is assigned [46].

Model selection
The fit of the models are compared by relying on the Deviance Information Criterion (DIC) [50] and Widely Applicable Information Criterion (WAIC) [51], and are summarised in Table 3. As a rule of thumb can be seen that the model with the lowest value describes the data best. We specified six distinct models, aiming to identify the importance of, for instance, socio-economic or georeferenced factors. In more detail, the differences between these models are summarised in Table 4.

PLOS ONE
Modelling chronic malnutrition in Zambia: A Bayesian distributional regression approach

Results
In the following Section we will discuss the results of Model 5, omitting insignificant terms, as Model 5 has both the lowest DIC and WAIC. Result of the included covariates are however, similar throughout all specifications. Table 5 shows the baseline characteristics of selected covariates in the population between the two ZDHS survey of 2007 and 2013/14, and remote sensed data aggregates for these waves. Data was available for 25,852 children from the two waves, 31% observation were from the 2007 and 69% were from the 2013/14 ZDHS. Levels of stunting decreased between 2007 and 13/14 from a mean z-scores of -1.59 CI(-1.63; -1.55) to -1.47 CI(-1.49; -1.44). The breastfeeding duration declined from 16.22 to 15.69. There was a notable increase in the number of received vaccinations by children from 5.6 to 7.5 vaccinations. There was a slight increase in the number of years the mother spent in school from 7.2 to 7.8. Malaria incidence rates (plasmodium falciparum incidence) declined from 26% to 20%. Night-time light increased from 2.75, to 3.72 (observed values were log transformed), a possible indication of increase in urbanisation. Night-time light was highly correlated (ρ = 0.73) to population density as such it was omitted in subsequent analysis.

Descriptive analysis
High disparities in the height-for-age z-score have been observed at the district and provincial level within Zambia. There was a drift in the spatial pattern of malnutrition in the 2013/14 wave compared to the previous survey, indicating a general improvement. See also Fig 1 for a more detailed, descriptive, analysis of the spatial patterns of the height-for-age z-score within Zambia.
We observed that in the 2007 wave, stunting was lowest in the Western and Muchinga province. In the Southern province generally, low values were also observed, except for the Sinazongwe district. For Eastern province, Nyimba, Katete, Petauke and Lundazi districts had high levels. In the Luapula province, high levels of stunting were observed in the districts of x 0 β y y y y y y f 1 (Asset index) y y y n y y f 2 (Birth order, Age mother at birth) y η μ , n η σ y η μ , n η σ y η μ , n η σ n y η μ , n η σ y η μ , n η σ f 3 (Age child, Breastfeeding duration) y η μ , n η σ y η μ , n η σ y η μ , n η σ n y η μ , n η σ y η μ , n η σ f 4 (Education mother) y y y n y y f 5 (Household size) y y y n y y f 6 (BMI mother) y y y n y y

Linear effects
With respect to the linear effects, Table 6 shows the effect of the gender and the area of residence on the posterior mean of the response variable. Considering the posterior mean of the height-for-age z-score of -1.70, boys were found to more stunted compared to girls. Stunting was also found to be higher in children from rural households compared to urban areas. Two patterns well documented in the literature for other countries and also Zambia [12,17,52].  Table 2 for detailed information); calculations by authors. https://doi.org/10.1371/journal.pone.0255073.t005

PLOS ONE
Modelling chronic malnutrition in Zambia: A Bayesian distributional regression approach Socio-economic characteristics Fig 3 shows the non-linear effect of the asset index, and the years of education of the mother on the mean μ and standard deviation σ of the response variable. Undernutrition has been associated to poverty [4], we observed that children living in poor household showed worse outcomes compare to children living in wealthier households, i.e the z-score is linearly increasing with increasing asset index. The effect of the asset index on the standard deviation does not notably vary across the range of the asset index, indicating a homogenous effect of wealth. The bottom panel of Fig 3 shows that an increase in the level of education of the mother above eight years of education is associated with an appreciable increase in the height-for-age z-score. Notably also is that, there is not much difference in this effect for mother's education less than 8 years. This entails that primary school education does not improve the nutrition outcome of the children as much. Above eight years of schooling, we see clearly that increase in the years has positive effect on the z-score. Moreover, the variation in the height-for-age zscore is higher with less years of schooling, whereas the variation gradually decreases with increasing levels of education of the mother. Fig 4 shows the non-linear effect of the number of vaccinations the child received and mother's BMI on the mean and the standard deviation z-score. The top graph shows that there was a positive effect on the mean z-score with increase in the number of vaccinations the child received. There is also greater variation in stunting levels among children who received less than 2 vaccinations.
Low values of the mothers BMI are negatively associated with the height-for-age z-score of the child, while for increasing values of the BMI also an increase in the posterior mean of the z/score can be observed. For values above 40 for the BMI of their mother the results are inconclusive indicated by the widening of the credible intervals. Low values of the BMI of the mother are associated with less variation compared to high values. Fig 5 shows that increasing malaria incidence about 0.3 had a negative effect on the z-score, however we do not see any meaningful differences in the standard deviation over the spectrum the malaria incidences.
Due to the high correlation of breastfeeding and age of the child, an interaction between these two variables can be presumed for which one has to account for. Fig 6 shows that children below 12 months of age who were breastfeed, were not malnourished. Accordingly, malnutrition mostly seems to be a process that comes to effect as children grow. Stunting was high for children above 36 months of age and who were breastfeed. On the right side, the figure shows that stunting was low in children whose mothers were around 30 years and with respect to birth order, which emphasizes that especially children of very young mothers are those most vulnerable. Fig 7 emphasizes the pronounced north and south pattern after adjusting for all the other variables, in particular after adjusting for wealth and rurality which was already described in the descriptive analysis. The highest variation in the height-for-age z-score was also observed in north (Luapula province) in both waves as can be seen on the right side of the figure.

Discussion & conclusion
Using the two waves of the ZDHS, we modelled the height-for-age z-score by using socio-economic and remote sensed information. To analyse the whole distribution and not just focusing on the conditional mean, we used a Bayesian distributional regression approach accounting for heterogeneity as well.

PLOS ONE
Using Bayesian distributional regression, we assessed the relationship of socio-economic, and remote sensed covariates and stunting. Bayesian distributional regression, presents an advantage in terms of model flexibility allowing to incorporate, amongst others, non-linear effects and spatial effects. This however comes also with the drawback of data intensity and computational complexity.

PLOS ONE
Remote sensed techniques can be useful for future research on community health assessment as these techniques provide an advantage to take measurements quickly for remote and hard to reach areas. The data also enable to make meaningful analyses at sub-national levels which can improve targeting of interventions due to high levels of geographic specificity [26,33], however they do not give a full picture. Therefore, it is important to account for other covariates such as socio-economic characteristics at the individual or household level.
When relying on remote sensed information to asses anthropometric measures or biophysical developments, great caution should be taken with respect to data quality. Our study finds that remote sensed covariates alone explain little of the variation of the response, this emphasizes the need to control also for socio-economic characteristics. We find that the combination of remote sensed data and socio-economic characteristics explain more of the variation of the response, compared to solely focusing on one of the two sources of explanatory variables. In addition this also highlights the strong influence of socio-economic covariates or can be seen as an indicator of poor quality of the available remote sensed information.
Clear non-linear patterns emerged with respect to the years of education of the mother, and number of vaccinations. There was a clear non-linear tendency among children whose mothers had up to eight years of schooling having a low height-for-age z-score. For children of mothers with secondary or higher education the height-for-age z-score starts to improve strongly. This trend is consistent with what has been observed in others studies were odds of stunting were higher among children from mothers who had few years of education [12,52] and lowest among those who had advanced years in education [52]. Higher education level has been associated with increased income levels and improved knowledge among mothers who are usually the primary caregivers. As such educated mothers are more likely to take better care of their children by making informed nutritional decisions [24,52,53]. Increasing number of vaccinations showed improved z-score among children. Even though the effect was significant, the same size of the effect might not be relevant in practice.
Moreover, considering the full distribution like we did shows that the variation is highest for low levels of education and decreases with increasing years of education. This study did not consider the association of paternal education and the child z-score, however in another study, it was found that it was Maternal education that had a positive impact on children's nutritional status [17].
We observe differences in levels of malnutrition in various regions in Zambia. One consistent pattern is that of discrepancy between the rural areas which are worse off compared to urban areas and confirms socio-economic inequalities between rural and urban areas. This may suggest social and economic inequalities between such areas. This has already been documented in other studies [12,14,24]. Furthermore, in terms of a spatial distribution, when you consider a smooth spatial effect, there is a clear regional variation in addition to the effect of rurality. Even after accounting for economic activities, the farming southern regions tend to be well off compared to the more industrialised northern areas. There is need to investigate further the underlying factors that contribute to the variation in the height-for-age z-score.
The present study shows that stunting still remain high in Zambia with remarkable regional inequalities and the decline is gradual which is unacceptable. There is need therefore to address the socio-economic indicators if this status is to improve.