Understanding Child Stunting in India: A Comprehensive Analysis of Socio-Economic, Nutritional and Environmental Determinants Using Additive Quantile Regression

Background Most attempts to address undernutrition, responsible for one third of global child deaths, have fallen behind expectations. This suggests that the assumptions underlying current modelling and intervention practices should be revisited. Objective We undertook a comprehensive analysis of the determinants of child stunting in India, and explored whether the established focus on linear effects of single risks is appropriate. Design Using cross-sectional data for children aged 0–24 months from the Indian National Family Health Survey for 2005/2006, we populated an evidence-based diagram of immediate, intermediate and underlying determinants of stunting. We modelled linear, non-linear, spatial and age-varying effects of these determinants using additive quantile regression for four quantiles of the Z-score of standardized height-for-age and logistic regression for stunting and severe stunting. Results At least one variable within each of eleven groups of determinants was significantly associated with height-for-age in the 35% Z-score quantile regression. The non-modifiable risk factors child age and sex, and the protective factors household wealth, maternal education and BMI showed the largest effects. Being a twin or multiple birth was associated with dramatically decreased height-for-age. Maternal age, maternal BMI, birth order and number of antenatal visits influenced child stunting in non-linear ways. Findings across the four quantile and two logistic regression models were largely comparable. Conclusions Our analysis confirms the multifactorial nature of child stunting. It emphasizes the need to pursue a systems-based approach and to consider non-linear effects, and suggests that differential effects across the height-for-age distribution do not play a major role.


Introduction
Child undernutrition is the cause of one third of deaths in children under five [1]. It produces serious health, social and economic consequences throughout the life course [2][3][4] as well as across generations [5], making it the leading risk factor among children under five worldwide [6]. Low height-for-age or stunting reflects a failure to reach linear growth potential, and is a key indicator of chronic undernutrition. Globally, depending on the precise definition and estimate used, between 171 million [7,8] and 314 million [9] children under five are currently classified as stunted, with 90% of this burden occurring in 36 African and Asian countries [1]. Between 1985 and 2011 the prevalence of moderate-to-severe stunting has declined from 47% to 30% [9], but progress has been highly uneven, and stunting rates in the most affected world regions have largely remained static [9,10].
To date, most of the large-scale programmes to address stunting have fallen behind expectations. Systematic reviews of the effectiveness of some of the major nutrition interventions, such as promotion of breastfeeding [11], promotion of complementary feeding through education or food provision [3,[12][13][14], and supplementation with single or multiple nutrients [15,16] usually show significant impacts on behaviour but modest and contextdependent impacts on height gain or stunting prevalence [17]. Moreover, few children in the developing world currently benefit from optimal breastfeeding practices, as well as sufficient dietary diversity and meal frequency [7]. In contrast, the history of most industrialized countries suggests that virtually all stunting can be averted, making the failure to make rapid progress all the more disconcerting. Therefore, it is essential to revisit the assumptions that underlie current intervention practices.
It is broadly accepted that child stunting is the outcome of multiple risk factors. Nevertheless, much of the modelling to assess presumed cause-effect relationships in observational epidemiology and effectiveness research tends to reduce this complex interplay of risk factors through focusing on single risks and interventions. The recent emphasis on the relevance of systems approaches in epidemiology [18][19][20][21][22] implies, however, that the determinants of stunting must be examined in their entirety, if we do not want to risk incorrect estimates of risk factors and interventions as a result of oversimplifications in modelling approaches. Furthermore, it has been suggested that the impact of risk factors (and interventions) on the lower tail of the distribution might differ considerably from their impact on population means [12]; therefore a careful exploration of such differential effects is merited. Finally, the assumption that many ''established'' risk factors exert their effect in a linear way is being challenged by emerging evidence of non-linear effects [23].
In light of the above, this study aims to undertake a comprehensive analysis of the determinants of child stunting, and to explore whether the three above-described commonpractice simplifications in modelling approaches are appropriate. More specifically, the objectives are to (i) capture the interconnectedness between multiple risk factors through an integrated analysis, (ii) explore whether differential effects emerge across the height-for-age distribution, and (iii) test whether non-linear effects play a role. To do so, we developed a conceptual diagram of potential determinants, and applied the innovative statistical approach of additive quantile regression with boosting estimation to data from the Indian National Family Health Survey (NFHS). With an estimated stunting prevalence of 51% and 61 million stunted children, India is the most affected country in the world [1] and improvements in the last two decades have been almost negligible [24].

Conceptual diagram and corresponding literature
We pursued an evidence-based approach to mapping the complex interplay of factors that determine whether a child becomes stunted or not. Drawing on the well-known UNICEF framework [25,1] and a priori reasoning, we conducted extensive literature searches and structured our findings in a diagram of immediate, intermediate and underlying determinants of child stunting comprising sixteen main groups of determinants ( Figure 1). In theory, a comprehensive analysis should consider all of these determinants.
Age and sex are critical non-modifiable factors [3,26]. The most important modifiable immediate causes of stunting are inadequate caloric and nutrient intake and uptake [25]. Intrauterine growth restriction (IUGR) is also known to affect long-term growth and development [27][28][29].
Large families and scarce, poorly distributed resources may limit food access and prompt household food competition. Various studies have found crowding [30], number of children living in a household [27], birth order [30], and birth interval [31,32] to be associated with stunting.
While improved water, sanitation and hygiene practices protect against stunting [33][34][35], indoor air pollution from solid fuel use has been suggested as a risk factor [30,36,37]. Environmental tobacco smoke (ETS) shows positive, negative and null associations depending on the country [38].
Recurrent infections, such as diarrhea [52], acute respiratory infections [53], and helminthes [54,55] along with chronic diseases such as HIV/AIDS [56,57], may also increase risk, as these conditions can reduce appetite, hinder uptake of nutrients or increase metabolic requirements and nutrient loss [58].
Availability, accessibility and affordability of appropriate healthcare during pregnancy, birth, the postnatal period and continuing into childhood [59,60] determines a health system's ability to prevent, diagnose and treat chronic undernutrition [61].
Stunting prevalence varies widely both between [69] and within countries [78]. Relevant regional characteristics include urban/rural location and the capacity to produce food (e.g. local climate, land use [79,66]; and distribute food (e.g. road infrastructure, markets). Population growth, land degradation and increasing climate variability are all predicted to strain food production and increase the burden of child undernutrition [80].

Data and variables
We used data from the Indian NFHS for the years 2005/2006, a large, well-established, nationally representative survey based on a multi-stage cluster sample design that provides high-quality information on the health and nutrition of women and children [81]. The National Family Health Survey is the Indian equivalent of the Demographic and Health Surveys, a series of standardised surveys which are routinely conducted in more than 70 developing countries. All data are in the public domain and can be downloaded, after registration, from http://www.measuredhs. com. In our analysis we focused on children aged 0-24 months, as stunting prevalence progressively increases until it reaches a plateau at around 24 months [1,3,26] and as it becomes very difficult to reverse stunting after this critical time window [82]. Stunting is measured by a Z-score of standardized height-for-age according to the WHO child growth standards [83]; stunted or severely stunted children are those with a Z-score below -2 or -3, respectively [1]. Figure 1 served as a basis for identifying relevant variables within each group of determinants; all variables, as well as their definitions and empirical distributions in the final dataset are shown in Table 1. We carefully investigated all potential variables to populate a determinant from the diagram and chose suitable variables or proxies based on descriptive statistics. The final dataset contains variables to populate most groups, but measures or suitable proxies of IUGR, nutrient intake and uptake, chronic diseases and recurrent infections were not available. For other groups, we could not assess all characteristics of interest, for example in relation to maternal psychosocial health, zinc and ETS. We examined various measures of curative and preventative healthcare, e.g. possession of a health card, health facility visit in past three months, care-seeking for episodes of respiratory infections, or diarrhoea during the two weeks preceding the survey. We ultimately settled for the number of antenatal visits as a proxy for care during pregnancy and childbirth, and constructed a vaccination index based on vaccinations against measles, polio, tuberculosis (BCG) and diphtheria, pertussis and tetanus (DPT) as a proxy for care during childhood. We constructed a three-level variable for breastfeeding and two variables for complementary feeding ( Table 1). Thereby, food quantity was assessed as meal frequency in the previous 24 hours. Food diversity was measured as the number of food groups a child had consumed in the previous 24 hours, with eight groups defined as food made from grains; food made from roots; food made from beans, peas, lentils, nuts; fruits and vegetables rich in vitamin A; other fruits and vegetables; meat, fish, poultry, eggs; cheese, yoghurt, other milk products; and other food [84]. Grouping of both complementary feeding variables was based on empirical frequencies in our dataset to obtain sufficiently large group sizes.
We defined our study population as the youngest child aged 0-24 months living in each household; not-de jure residents were excluded, as several determinants relate to the household environment. Starting from 17039 children, we excluded 2779 children due to missing outcome and 2084 due to missing covariates. The latter were mainly attributable to seven covariates with 50 or more missing values: caste (640 missing values), partner's occupation (212), partner's education (165), drinking water (50), vaccination index (280), number of antenatal visits (153), vitamin A (450), and iodine (118). Our final dataset comprised 12 176 observations; the proportion of missing data was thus about 29%.

Statistical modelling
We undertook additive quantile regression based on boosting estimation [85], an innovative statistical approach that allows the three underlying research objectives to be investigated simultaneously.
N Quantile regression models quantiles of the outcome as a function of covariates, and therefore enabled us to explore   whether covariates exert differential effects across the Z-score distribution, in particular towards the lower tail. In contrast, most analyses of the determinants of undernutrition have used logistic regression models for dichotomized versions of the Zscore (e.g. stunted vs. not stunted) or linear regression models for the continuous Z-score.
N The use of an additive predictor allowed us to explore linear, potentially non-linear, age-varying and spatial effects of the numerous covariates in a flexible way. Additive quantile regression extends conventional linear quantile regression by including flexible functional covariate effects in the predictor while maintaining the assumption of an additive structure. For example, the association between a continuous covariate and the outcome is left unspecified before the analysis and its functional shape is then estimated by, e.g., penalized splines. Most analyses to date have ignored the fact that selected covariates may exert their effects in non-linear and age-varying ways.
N Boosting, a computer-intensive inference method for highly complex models, is currently one of the few possibilities to estimate an additive quantile regression model. As boosting combines parameter estimation and variable selection in one single step, a large number of covariates can be included in the model without requiring subsequent steps of variable selection, as would be the case in classical estimation of quantile or logistic regression. Thereby, boosting estimation enabled us to capture the complex interplay of multiple risk factors in one single model.
We used the following model to assess the impact of stunting determinants on four quantiles of the Z-score: The additive predictor g t i models the conditional quantile  Table 1), where this choice allows results to be compared across quantile and logistic regression models. t~0:50 and t~0:05 represent the median and an extreme value, respectively.
The flexible additive predictor g t i is quantile-specific and comprises linear effects b t 0 ,b t 1 ,:::,b t k for categorical covariates x 1 ,:::,x k , and linear or smooth non-linear effects f t 1 ,:::,f t p for continuous covariates z 1 ,:::,z p . The shapes of the functions f t 1 ,:::,f t p are determined as linear or non-linear in a data-driven way [86] and estimated by the established approach of penalized splines [87]. Also specified are non-linear age-varying effects g t 1 (age),:::,g t m (age) for different levels of the feeding variables v 1 ,:::,v m ; these flexible interaction terms allow meaning and effect of breastfeeding and complementary feeding to vary with age [39]. Further interaction terms were not considered. For the categorical variable u, corresponding to 29 Indian states, a smooth spatial function f t,spat is estimated based on a Gaussian Markov random field [88] to account for spatial autocorrelation and unobserved heterogeneity.
Model estimation was undertaken separately for each t using a component-wise functional gradient descent boosting algorithm [89]. The optimal number of iterations was determined by fivefold cross-validation. The step length was set to 0.2 and each base learner had similar degrees of freedom [90]. Model estimation was repeated on 100 bootstrap samples of the dataset to obtain 95% bootstrap confidence intervals [q q j,2:5% ,q q j,97:5% ] whereq q j,2:5% denotes the estimated 2.5 % quantile ofb b t j , j = 0,1,…,k. All analyses were undertaken with the add-on package mboost [91,92] in the statistical software R [93].
To allow for a comparison with established approaches to investigate child stunting, we also conducted logistic regression analyses for the binary variables stunting and severe stunting. We specified the same flexible predictor and used boosting estimation as described above for quantile regression. This was done to ensure that the conceptual difference between quantile and logistic regression remained as the only explanation for any discrepancies in results. Table 2 shows the results of the 35% and 15% Z-score quantile regression; detailed results of the 50% and 5% Z-score quantile regression are available upon request. (Please note guidance on how statistical significance was assessed in our analysis.) Table 3 summarizes the results of logistic regression for stunting and severe stunting. All findings on effects of single variables described in text, tables and figures are fully adjusted for other variables.  Here, we focus on the results of the 35% Z-score quantile regression, which corresponds to the empirical frequency for stunting (37%) in our dataset and therefore allows the results to be compared with those of logistic regression for being stunted. Importantly, except for the indoor air pollution group, at least one variable in each of the eleven assessed groups of determinants shows a statistically significant association with the 35% Z-score quantile. With respect to our research objectives, this suggests that an integrated analysis of the multiple immediate, intermediate and underlying determinants of stunting is merited.  Table 2 shows the effects for categorical covariates and their 95% bootstrap confidence intervals, and summarizes the shape of the function for continuous variables. The following categorical covariates have at least one significant category compared with the reference category: child sex, household wealth, caste of household head, mother is currently working, child is twin, sanitation facility, vaccination index, vitamin A and iodine. For example, the 35% Zscore quantile for children from the richest households is significantly increased by 0.  Non-linear functions are estimated for maternal age and BMI, birth order, preceding birth interval and the number of antenatal visits ( Figure 2). The effect of maternal age increases linearly until 30 years, then remains constant and gradually decreases from 45 years. Height-for-age increases monotonically with greater maternal BMI, with the slope reducing at 25 kg/m 2 . Birth order shows a linearly decreasing effect until the 6 th child and then remains constant, while lengthening the interval between births is associated with increased height-for-age up until 100 months. The effect of the number of antenatal visits has a slight inverse Ushape, where low and high numbers of antenatal visits are associated with smaller 35% quantiles than medium numbers (8-15 visits). With respect to our research objectives, the observed non-linear functions emphasize that selected determinants of stunting exert their effects in non-linear ways. Figure 3 depicts the age-varying effects of feeding variables. The effect of breastfeeding on the 35% Z-score quantile clearly varies with age: any breastfeeding compared to no breastfeeding exerts a positive effect until 9 months followed by a negative effect beginning at 12 months; the increasing effect of exclusive breastfeeding after 14 months is based on small numbers and shows large variation. Compared to low food diversity, high diversity exerts a significantly negative effect until the age of 12 months, and a significantly positive effect thereafter; medium food diversity does not differ significantly from the reference category. No significant differences in relation to meal frequency are observed. Figure 4 displays the empirically observed 35% Z-score quantiles for 29 Indian states, showing stark differences in stunting (Figure 4a), and the estimated spatial effect on the 35% Z-score quantile for state of residence (Figure 4b). Less pronounced differences in Figure 4b compared to Figure 4a imply that model covariates offer a partial explanation for regional differences.

Results
There are no fundamental differences between the results for the 35% Z-score quantile and those for other quantiles (see Table 2 for 15% Z-score quantile). The majority of categorical, continuous and age-varying variables described above also show significant effects of the same direction and of a similar size for the 15% and 50% Z-score quantiles; for the extreme 5% Z-score quantile, some of these variables are no longer significant. Two   Water, sanitation and hygiene categorical variables, however, only show statistical significance in analyses of one quantile: mother is working (35% Z-score quantile) and main cooking fuel (15% Z-score quantile). The above described non-linear effects are very similar across all quantiles, even for the 5% Z-score quantile. The only difference with regard to linearity vs. non-linearity is detected for maternal education (linear for 15% and 35%, non-linear for 5% and 50%).
Likewise, the differences between the results for quantile and logistic regression models are limited ( Table 3; please note guidance on interpretation of effect estimates in logistic vs. quantile regression). Most statistically significant variables across the four quantiles also show significance in logistic regression analyses. Exclusive breastfeeding, birth order, number of antenatal visits and vitamin A, however, show no effects on stunting and Significant effects are shown in bold; please see Figure 2, footnote 1, on how statistical signifance is assessed. 2 The effect of a covariate in logistic regression relates to the log-odds ratio for being stunted or severely stunted (in contrast to quantile regression where an effect relates to the respective quantile of the Z-score). For example, the log-odds ratio for being stunted for girls is 20.080 smaller compared to boys, given all other covariates are similar.
severe stunting. In contrast, main cooking fuel is statistically significant in both logistic regression models. With respect to our research objectives, the mostly consistent results across different Z-score quantiles and modelling approaches suggest that risk factors do not appear to show differential effects across the height-for-age distribution.

Key findings
We employed an evidence-based, systematic approach to identify all likely determinants of child stunting and to capture the interconnectedness between multiple risk factors within the system. For each of the eleven groups of determinants we conceptualized in Figure 1 and were able to populate with variables from the Indian NFHS, we found at least one variable with a statistically significant effect in all quantile and logistic regression models -except for the indoor air pollution group, which only showed a significant effect in three out of six regression models. This emphasizes the broad range of causes of child stunting, encompassing more distal maternal, household socioeconomic and regional characteristics as well as more proximate environmental, nutrition, infection-related and healthcare-related determinants. It suggests many potential entry-points for intervention and offers some insights regarding high-risk groups. Yet, our analysis also implies that a less comprehensive approach may overlook key determinants of stunting, potentially resulting in incorrect effect estimates in analyses of risk factors or leading to interventions that do not sufficiently take context into account.
Looking more closely within groups of determinants, our analysis confirms the importance of child age and sex as nonmodifiable determinants and highlights household wealth, greater maternal education and greater maternal BMI as major protective factors, given the large and statistically significant effects of these variables. The findings regarding household characteristics, such as household wealth and maternal education [62][63][64][65], and maternal nutrition status [70][71][72] mirror those in the literature. Our research also draws attention to twins as a potentially overlooked risk group [75]; the very large significantly negative effect is remarkable, as only 1% of children in the NFHS dataset are twins or multiple births. On the other hand, none of the models detected statistically significant effects of religion of household head, partner's occupation, sex of household head, urban/rural location, number of household members, drinking water, meal frequency by age, or iron supplementation, which contrasts with previous reports [30,33,34,41,42,43,46,47,48,78]. This may be due to the poor quality of the proxy measures we employed or differences in the population distribution of variables [94]. Most importantly, it may reflect the fact that in a more comprehensive model, the effect of some variables is captured by other related variables.
Statistical modelling was realized by additive quantile regression to explore whether differential effects emerge across the height-forage distribution and to investigate the presence of non-linear effects. The results across the four quantile and two logistic regression analyses were largely comparable, suggesting that the impact of most of the variables on lower tails of the height-for-age distribution does not differ from their impact on the population mean. We attribute this lack of differential effects to the symmetric shape of the height-for-age Z-score distribution which is independent of covariates. Therefore, using the more established logistic regression instead of quantile regression is likely to be appropriate in most analyses of the determinants of child stunting. Importantly, this research has demonstrated that maternal age, maternal BMI, and birth order exert their effect in a non-linear way; for maternal age and BMI these findings are in line with previous results [23]. Thus, assuming linearity in statistical modelling could lead to incorrect conclusions. To avoid inappropriate oversimplification, we propose that logistic or quantile regression models of stunting determinants should take a systems-based approach to analysis and explicitly consider potential non-linear effects.

Strengths and limitations of this study
Data quality. An inherent limitation of cross-sectional data is their snapshot nature, which makes establishing a temporal sequence of events and drawing causal inferences impossible. Moreover, while the NFHS includes suitable variables for most determinants of stunting, we could not model the impact of immediate determinants, were unable to populate the groups of determinants chronic diseases and recurrent infections and could only partially assess micronutrient deficiencies, healthcare, maternal or regional characteristics. Similarly, some of the proxies we used in our analysis may not provide an accurate estimate of the underlying concept of interest (e.g. type of cooking fuel as a proxy for indoor air pollution). Consequently, effect sizes for individual variables should be interpreted with caution. Even though the NFHS is considered a high-quality dataset, the logical consequence of assessing a large number of potential determinants was a high proportion of missing data (about 29%). Large numbers of missing values in selected variables, in particular in the outcome of interest, may have introduced selection bias. Indeed, compared to children with Z-score information, children for whom the outcome variable was missing were more likely to be younger and a twin (factors that increase stunting risk), as well as more likely to be born to mothers with greater maternal BMI and to live in wealthier and urban households (factors that decrease stunting risk). All differences were small, and are likely to increase uncertainty in effect estimates for these variables, thereby biasing results towards the null. Nevertheless the large-scale, standardized and nationally representative nature of the NFHS, a response rate of eligible women of 94.5% [84] and coverage of a broad range of health risks makes this data source ideally-suited for a comprehensive analysis of stunting determinants. Also, a recent methodological study suggests that cross-sectional studies can yield reliable estimates for risk factors that vary more across space at a fixed point in time than at a fixed location across different points in time [95].
Evidence-based approach. Based on earlier work in this field [25], a priori reasoning and extensive searches of the literature, we derived a schematic diagram of the multiple determinants of stunting. One limitation of this diagram is that it does not explicitly cover macro-level factors, such as good governance, peace and stability or climate change [18,79], factors that are likely to be relatively constant within a given country but that may be major underlying causes for cross-country differences in child undernutrition [94]. In addition, we neither examined the hierarchical structure contained within this diagram nor the pathways and relationships between individual determinants. Nevertheless, we believe that our approach to identifying all likely determinants of stunting and to populating as many of these as possible using an existing dataset is novel and takes up recent calls to incorporate systems thinking in epidemiology [19][20][21][22]96].
Statistical methods. Statistical modelling was realized by the innovative statistical approach of additive quantile regression based on boosting estimation since this method allowed us to simultaneously investigate our three research objectives. As extension of classical linear quantile regression, the flexible predictor of additive quantile regression enables potentially non-linear functional shapes of continuous covariates to be determined in a data-driven way and to account for spatial autocorrelation by including smooth spatial effects. Boosting combines parameter estimation and variable selection in one single estimation step, making it ideally suited to models with a large number of covariates, since subsequent steps of variable selection are not required. An inherent limitation of boosting is the lack of standard errors which makes the use of re-sampling methods, such as bootstrap, necessary to assess the variability of effect estimates. As a consequence, with boosting statistical significance cannot be assessed in a traditional way (i.e. based on test statistics with well-known distributions). In our analysis, we instead derived statistical significance from the bootstrap results. For a categorical covariate, for example, significance was defined as having at least one significant category compared with the reference category; and overall tests could not be conducted. A strength of boosting estimation is that it can be applied independently of the scale of the outcome and of the corresponding regression model, i.e., linear, quantile, or binary regression, as was demonstrated in our logistic regression analysis. On the other hand, an important limitation of our statistical modelling approach is that it does not explicitly account for the hierarchy implied by the conceptual diagram.

Implications for research and practice
Clearly, this research is located at the very beginning of a lengthy, cyclical process to develop and implement complex interventions, which comprises formative research and piloting as well as randomized controlled trials and implementation research [97]; and some of the insights might be specific for the Indian subcontinent. Do the insights gained impact in any way on how we might design and implement interventions more successfully?
The multi-factorial nature of child stunting offers many entrypoints for technical and policy solutions and suggests that, ultimately, the impact of any intervention is influenced by the combined effects of all of these groups of determinants within the system. If we fully accept this notion, the finding that many single interventions show rather limited health impact is not surprising. Indeed, initial findings from the Millennium Villages project suggest that a combination of nutrition-specific, health-based approaches with food system-and livelihood-based interventions can achieve substantial reductions in childhood stunting [98], although the approach to analysis likely overstates the impact of the intervention [99]. Embracing systems thinking, it also becomes clear that the design and implementation of interventions must not take place out of context and that ''context'' goes beyond a broad distinction between food-secure and food-insecure populations [17,100]. A range of socio-economic, cultural and climatic factors at household, community and national levels impacts the choice of universal versus targeted approaches [101,102,60] and other specific aspects related to the design and delivery of intervention packages.
Revisiting the determinants of child stunting is timely in view of recent calls to set up a national nutrition strategy for India, which would combine food and nutrition programmes with broad investments in health, sanitation, agriculture and women's status [101], emphasizing multi-sectoral coordination to assure that ''every link in the chain of malnutrition (is) considered'' [102]. It is also relevant with respect to the global hunger summit hosted during the London Olympics 2012 and commitments to invest in a range of measures to reduce child malnutrition prior to the Rio Olympics in 2016. We hope that the insights offered here will add food for thought in relation to how these pledges are put into practice.