Figures
Abstract
Background and objectives
Child undernutrition is a leading global health concern, especially in low and middle-income developing countries, including Bangladesh. Thus, the objectives of this study are to develop an appropriate model for predicting the risk of undernutrition and identify its influencing predictors among under-five children in Bangladesh using explainable machine learning algorithms.
Materials and methods
This study used the latest nationally representative cross-sectional Bangladesh demographic health survey (BDHS), 2017–18 data. The Boruta technique was implemented to identify the important predictors of undernutrition, and logistic regression, artificial neural network, random forest, and extreme gradient boosting (XGB) were adopted to predict undernutrition (stunting, wasting, and underweight) risk. The models’ performance was evaluated through accuracy and area under the curve (AUC). Additionally, SHapley Additive exPlanations (SHAP) were employed to illustrate the influencing predictors of undernutrition.
Results
The XGB-based model outperformed the other models, with the accuracy and AUC respectively 81.73% and 0.802 for stunting, 76.15% and 0.622 for wasting, and 79.13% and 0.712 for underweight. Moreover, the SHAP method demonstrated that the father’s education, wealth, mother’s education, BMI, birth interval, vitamin A, watching television, toilet facility, residence, and water source are the influential predictors of stunting. While, BMI, mother education, and BCG of wasting; and father education, wealth, mother education, BMI, birth interval, toilet facility, breastfeeding, birth order, and residence of underweight.
Citation: Islam MM, Kibria NMSJ, Kumar S, Roy DC, Karim MR (2024) Prediction of undernutrition and identification of its influencing predictors among under-five children in Bangladesh using explainable machine learning algorithms. PLoS ONE 19(12): e0315393. https://doi.org/10.1371/journal.pone.0315393
Editor: Benojir Ahammed, Khulna University, BANGLADESH
Received: May 27, 2024; Accepted: November 25, 2024; Published: December 6, 2024
Copyright: © 2024 Islam et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: TThe data set used and analyzed in this study is freely available at the Demographic and Health Surveys (DHS) program website https://dhsprogram.com/data/available-datasets.cfm. Interested researchers can freely obtain the data by registering at the website https://dhsprogram.com/data/new-user-registration.cfm. The step-by-step instructions on how to register and download the data are provided at: https://dhsprogram.com/data/Access-Instructions.cfm.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors declare that they have no conflict of competing interests
1. Introduction
Malnutrition refers to deficiencies, excesses, or imbalances in an individual’s energy and/or nutrient intake [1]. It encompasses two main types of medical conditions. The first is "undernutrition," which includes stunting, wasting, and underweight [2]. The other is non-communicable diseases linked to unhealthy diets, such as overweight, obesity, and related problems. Malnutrition, particularly undernutrition in early childhood, has an adverse effect on children’s physical and mental development and poses a significant risk for various chronic diseases, including both communicable and non-communicable [3–5]. Undernutrition can lead to individuals becoming undernourished, making them more susceptible to illness, increasing their chances of infection, and raising the risk of fractures [6]. The nation’s economy suffers long-term consequences from this problem, which also seriously impedes advancement. It is estimated that undernutrition accounts for one-third of sickness and mortality among children aged 59 months and under, and nearly 3.5 million fatalities worldwide [7, 8]. UNICEF reports that the current population of Bangladesh is 169.8 million, with 16.3 million being children under the age of five. It has been reported that approximately 9.5 million (54%) children are stunted, 17% are wasted, and 56% are underweight [9]. Despite the decrease in rates of undernutrition over the past few decades, child undernutrition remains a significant issue for Bangladesh. To enhance the management and control of undernutrition risk among children under five, it could be beneficial to employ a smart system that utilizes modern technologies to identify undernourished children early on [10]. Early detection of the associated risk factors as well as accurate diagnosis of the risk of undernutrition can play a key role in timely intervention with implementation in preventing undernutrition and other associated diseases. Thus, early detection of undernourished children and the identification of the contributing variables to their condition requires the implementation of a smart system.
Nowadays, machine learning (ML) is a modern technology that falls under the umbrella of artificial intelligence (AI). It is designed to identify patterns within data autonomously and utilize this information to make predictions. ML-based automated models that have been developed recently are gaining more and more interest for their ability to predict the risk of malnutrition among children under the age of five in various countries [11–19]. Over the past decade, a few studies have been done to develop a predictive tool for predicting the risk of undernutrition in Bangladesh [20–22]. The development of the prediction models for undernutrition is influenced by various factors that show considerable variation across different countries or regions over time. Rahman et al. [21] applied some ML algorithms to identify the risk factors of malnutrition based on Bangladesh Demographic Health Survey (BDHS), 2014 data. They utilized logistic regression (LR) model to identify the important factors of undernutrition. This study uses the latest BDHS, 2017–18 data and considers more predictors than those included in the referenced study [21]. The Boruta approach, which is a wrapper-based feature selection approach based on random forest (RF) classifier and can handle complex non-linear data with correlated features more effectively than the LR method, is applied here for selecting the important features. The paper utilizes the popular ML-based algorithms (LR, ANN, RF) and additionally extreme gradient boosting (XGB) for comparing models’ performance. Also, determining the influencing predictors that contribute to the outcome prediction using the SHapley Additive exPlanations (SHAP) method is one of the main focuses of this study.
An attempt has been made in this study to use an explainable ML-based model for the prediction of undernutrition status among under-five children in Bangladesh. Therefore, this study’s objective was to develop an appropriate model using explainable ML algorithms that predict the risk of undermatron among under-five children in Bangladesh. Furthermore, for model interpretation, the study has successfully determined the influential predictors that contribute to the prediction of undernutrition using SHAP, which is a post hoc model interpretation technique viz. theoretically based on the Shapley value. Consequently, this information can then be used as a guide for individualized prevention and treatment to prevent the development of undernutrition among under-five children in Bangladesh. The diagrammatic representation of the proposed framework is displayed in Fig 1.
The organization of the remaining part of this work is structured as follows: Section 2 introduces the materials and methods utilized. The results are shown in Section 3, and a thorough discussion is given in Section 4. Finally, the conclusions are presented in Section 5.
2. Materials and methods
2.1 Data source and study design
The dataset utilized in this study was obtained from the BDHS, 2017–18. This is the most recent comprehensive survey that includes all of the enumeration areas (EAs) of the nation. The samples of this survey were collected from different households using two-stage stratified cluster sampling [23]. A total of 675 EAs were chosen in the 1st stage, with the probability of selection being equal to the size of the EA. In the 2nd stage, 30 households were chosen with a systematic procedure from each selected EA. About 18, 000 ever-married women were selected to participate in the interview, with 17,863 (99%) of them completing the interview successfully. In the survey conducted in 2017–18, a total of 8,759 children under the age of five were identified as eligible for anthropometric measurement. Finally, excluding refused, don’t know, flagged cases, and others (technical problems), a total of 7,796 stunting, 7,777 wasting, and 7,838 underweight cases were incorporated for analysis.
2.2. Ethical approval
This study made use of an available public domain survey dataset from the BDHS, 2017–18. The BDHS surveys received ethical approval from ICF Macro Institutional Review Board, Maryland, USA, and the National Research Ethics Committee of Bangladesh Medical Research Council (BMRC), Dhaka, Bangladesh, that is why it does not require any additional ethical approval.
2.3. Outcome variables
The outcome variable of this study was undernutrition (stunting, wasting, and underweight) among under-five children. Stunting was measured by height-for-age Z-score (HAZ), wasting weight-for-age Z-score (WAZ), and underweight weight-for-height Z-score (WHZ) [24, 25]. According to WHO, children with HAZ<-2 standard deviation (SD) were considered stunted, WAZ<-2SD were wasted, and WHZ<-2 were underweight [26]. These variables were encoded in binary form, with values of 1 and 0 (1 for stunted, 0 for not stunted), (1 for wasted, 0 for not wasted), and (1 for underweight, 0 for not underweight).
2.4. Predictors
This study considered different demographic, socioeconomic, behavioral, and medical-related explanatory variables as predictors for undernutrition based on the accessibility of the dataset, self-efficacy, and previous sittings [27–32]. The predictors are age, residence, region, sex, mother education, father education, working, BMI, members, religion, wealth, breastfeeding, decision contraception, twin child, child sex, total children, birth order, birth interval, age at 1st birth, BCG, vitamin A, diarrhea, fever, cough, water source, cooking fuel, toilet facility, and watching television. Table 1 represents a detailed description and categorization of the chosen predictors.
2.5. Statistical analysis
The study participants’ background characteristics were reported as numbers (%) for the chosen predictors. Pearson’s χ2 test was executed to examine the association between different predictors and stunting, wasting, and underweight. Statistical package SPSS and programming languages R and Python were applied to analyze the data.
2.5.1. Handling missing value.
Missing data is the absence of values or information that should ideally be present in a dataset. In any data collection or analysis process, missing data can occur for a wide range of reasons, including, such as technical issues, human error, non-response, and so on. Addressing missing data is an essential aspect of data analysis that has the potential to affect accuracy and reliability [33]. In this study, predictors with<40% missing values were taken into consideration, while predictors with ≥40% were eliminated from the dataset [34, 35]. There are several methods for handling missing data, including dropping missing values and filling missing values. This study employed a widely popular K-NN method to address the issue of missing values [36–38].
2.6. Data partition
The dataset was partitioned into two sets: training and testing, using a random partitioning method with an 8:2 ratio. There were 6237 children in the training set and 1559 in the testing set for stunting, 6222 children in the training set and 1555 in the testing set for wasting, 6270 children in the training set, and 1568 in the testing set for underweight.
2.7. Feature selection
Feature selection, also known as variable/attribute selection in statistics as well as ML, plays a vital role in developing an effective prediction model by choosing the most important features. It can also lead to enhanced performance of the model, better generalization, speedier computing, and greater interpretability [39]. We utilized the Boruta feature selection method in this work to identify the important predictors of stunting, wasting, and underweight in the training phase. Boruta is a wrapper-based approach that makes use of the RF classifier and outperforms others because it is consistent and unbiased [40]. The following steps were used in Boruta method to identify the important predictors
- Step 1: Create a shadow dataset by shuffling the values of each predictor randomly.
- Step 2: Merge the original and shadow datasets to make a single dataset.
- Step 3: Train a RF classifier by utilizing the merged dataset and assess each predictor’s significance using a variable importance measure.
- Step 4: Calculate the Z-score for each predictor by utilizing the predictor’s importance values. The Z-score can be determined using the following formula: Z-score = (Predictor Importance—Mean (Shadow Predictor Importance)) / Standard Deviation (Shadow Predictor Importance).
- Step 5: Predictors exceeding a specific threshold Z-score (typically positive) are labeled as "Confirmed," while predictors falling below this threshold are labeled as "Rejected."
- Step 6: Repeat this process until all predictors are either confirmed or rejected.
2.8. Machine learning algorithms
The current study adopted three distinct types of popular ML-based algorithms to predict undernutrition risk among under-five children in Bangladesh (Table 2).
2.8.1. Logistic regression.
Logistic regression (LR) is a commonly utilized statistical method in predictive modeling for predicting the outcome of a categorical dependent variable [41]. The LR method uses the sigmoid function to determine the probability of the outcome variable based on the input predictors. The logistic regression equation can be defined as follows:
(1)
Where, pi indicates the probability of undernutrition for ith children and 1−pi indicates the probability of non-undernutrition; xki is the kth input predictors of the ith children and βk is the kth regression coefficients. The maximum likelihood method was utilized to estimate the model parameters for the logistic regression equation. Eq (1) can be represented as
(2)
and odds as
(3)
If , predict class 1 (undernutrition); otherwise, predict 0 (not-undernutrition).
2.8.2 Artificial neural network.
An artificial neural network (ANN) is a type of non-linear ML technique that can be utilized to perform a variety of tasks, including classification, regression, and so on [42]. It consists of interconnected nodes, called neurons, which are arranged into three layers. including an input layer, one or more hidden layers, and an output layer. During training, the network adjusted the weights and biases linked to each neuron to minimize error. This process is carried out by employing an optimization algorithm, like gradient descent, that iteratively updates the weights and biases through the sigmoid activation function. The function can be expressed as follows
(4)
Here, z is the input. The procedure is repeated until the values of the iteration remain unchanged.
2.8.3 Random forest.
Random forest (RF), introduced by Breiman, is a versatile ensemble-based ML algorithm [43]. The RF model was constructed using the following steps
- Step 1: Select sample data using the bootstrap method from the training set
- Step 2: Construct a decision tree (DT) for each sample data.
- Step 3: Build a forest with 500 trees or more by repeating Step 1 and Step 2
- Step 4: Take into account the predictions made by each formed DT, then use a majority vote to determine the final prediction.
2.8.4 Extreme gradient boosting.
Extreme gradient boosting (XGB) is a highly effective ensemble learning algorithm commonly used in various fields such as classification, regression, and ranking [44]. The algorithm is developed based on the principles of gradient boosting framework. It works iteratively through the use of decision trees, each aimed at correcting the errors of the previous trees. For binary classification, a logistic loss function with logistic transformation is useful for deriving the predicted probabilities from the model predictions. `The logistic loss function is defined as
(5)
Where, yi is the true class label of ith children and pi is the predicted probability that the ith children belong to the positive class.
2.9. Hyperparameters tuning
Hyperparameters in machine learning are variables whose values are predefined before to the start of the learning process. They control the execution of the learning algorithm, affecting factors such as learning rate, regularization strength, and model complexity. Tuning these hyperparameters is crucial for optimizing model performance. The grid search approach with 10-fold cross-validation (CV) protocols was employed to tune the hyperparameter values in the training phase.
2.10. Performance evaluation metrics
The model’s performance was assessed by accuracy, precision, recall, and F-score in the testing set [45–47]. These values of the performance metrics are calculated based on the confusion matrix via four measurements: true positive (TP), false negative (FN), false positive (FN), and true negative (TN). Also, the area under the curve (AUC) is considered for the evaluation of the models. The AUC is a single value representing the area under the ROC curve, demonstrating the model’s ability to discriminate between undernutrition and non-undernutrition. It is mathematically represented as
(6)
The probability curve, known as the ROC curve, shows the relationship between sensitivity and 1-specificity at various classification cut-off points. It is a widely used metric for evaluating the predictive effectiveness of machine learning models in medical diagnostics [48].
2.11. Predictor’s assessment using SHAP analysis
The traditional output of the XGB model only sorts the importance of variables, but it does not provide a way to assess the direction and magnitude of their impact on outcomes. SHAP is a widely used framework for interpretability in machine learning [49]. It assigns the prediction of a model to its individual features, determining how much each feature contributes to the final outcome through visualization. It is based on Shapley values derived from additive feature attribution methods, originally introduced by Lloyd Shapley in the field of game theory [50]. This approach provides a fair solution for each participant in the models by offering a wide range of features, including consistency, efficiency, dummy, and additively. The efficiency property of the SHAP method leads to more reliable outcomes when compared to alternative methods, like local interpretable model-agnostic explanations. However, predictors that have a positive SHAP value aid in the prediction of children with undernutrition in the model, while predictors with a negative SHAP value aid in the prediction of children with not undernutrition. Particularly, the importance of individual predictor, say the kth predictor is ascertained through the Shapley value, which is computed using the following formula
(7)
Where, S represents the subset of predictors that do not contain the predictor for which we are determining the value of ∅k(v); S∪{k} represents the group of predictors that includes S as well as the kth predictor; v(S) represents the outcome of an ML-based model that utilizes the predictors from S. S⊆M\{k} means all sets of S in M predictors, excluding the kth predictor.
3. Results
3.1 Background characteristics
Table 3 represents the background characteristics of the study participants. This study reported that the overall prevalence of stunting was 31.3%, wasting 8.5%, and underweight 22.5%. The average height, weight, and age of the children were 83.07±14.60 cm, 10.77±3.41 kg, and 28.61±17.58 months, respectively, and mostly resided in rural areas. Mothers aged 45–49 years showed the highest percentage (66.7%) of stunting, whereas mothers aged 40–44 years revealed the largest percentage (10.6%) of wasting, and mothers aged 45–49 years showed the largest percentage (53.3%) of underweight. Sylhet division showed the highest percentage of being stunting (41.2%), wasting (9.9%), and underweight (30.8%) compared to other divisions in Bangladesh. Uneducated mothers exhibited the largest percentage of stunting (44.3%), wasting (12.4%), and underweight (36.3%), while higher educated mothers found the lowest percentage of stunting (15.1%), wasting (6.2%), and underweight (10.9%). Underweighted mothers found the greater percentage of stunting (41.8%), wasting (13.7%), and underweight (33.2%). Table 3 showed that age, residence, region, sex, mother education, father education, working, BMI, members, wealth, contraception, twin child, total children, birth order, birth interval, age at 1st birth, vitamin A, water source, cooking fuel, toilet facility, and watching television were significantly associated with stunting; Mother education, BMI, child sex, BCG, vitamin A, and fever were significantly associated with wasting; Age, residence, region, sex, mother education, father education, working, BMI, members, wealth, breastfeeding, contraception, twin child, total children, birth order, birth interval, age at 1st birth, fever, cough, water source, cooking fuel, toilet facility, and watching television were significantly associated with underweight (p-value<0.05).
3.2 Predictor’s selection by Boruta
The predictors selection results based on Boruta for stunting, wasting, and underweight are displayed in Figs 2–4. The method revealed that there are 17 important predictors associated with stunting out of 21, 5 predictors for wasting out of 7, and 17 predictors for underweight out of 23. The selected predictors of stunting are water sources, residence, toilet facility, coking fuel, child twin, age, contraception, total children, watching television, birth interval, division, birth order, vitamin A, BMI, wealth, mother education, and father education (Fig 2); wasting are fever, BCG, BMI, father education, and mother education (Fig 3); underweight are residence, watching television, water source, toilet facility, contraception, birth interval, total children, fever, region, cooking facility, birth order, age, twin child, BMI, wealth, mother education, and father education (Fig 4). The selected predictors have been incorporated for predicting the risk of undernutrition (stunting, wasting, and underweight) among under-five children in Bangladesh.
3.3. Performance comparison of ML-based models
The predictive performance of four ML-based models is presented in Table 4.
It is to be noticed that the XGB model attained the outperformed prediction accuracy of 81.73%, precision of 88.28%, recall of 89.41%, and F-score of 88.84% for stunting, while LR obtained the lowest accuracy of 76.40%, precision of 84.11%, recall of 85.56%, and F-score of 84.83%. The XGB model also demonstrated the highest level of predictive accuracy of 81.73%, precision of 88.28%, recall of 89.41%, and F-score of 88.84% for wasting. Furthermore, in comparison to the other models, the XGB model achieved an accuracy of 81.73%, a precision of 88.28%, a recall of 89.41%, and F-score of 88.84% for underweight.
The corresponding ROC curves of stunting, wasting, and underweight was portrayed in Figs 5–7, and indicated that the XGB-based model acquired a larger area of ROC curve than other models: LR, ANN, and RF. Hence, the XGB-based model appears to be the most appropriate choice for predicting indicators of undernutrition among under-five children in Bangladesh.
3.4 Influencing predictors for undernutrition
To examine the importance of each predictor in the prediction, SHAP summary plot was made for the best XGB model by using SHAP values. In the SHAP summary plot, the x-axis represents the SHAP values, while the y-axis represents the contribution of each predictor. A predictor with a higher SHAP value is more likely to influence the occurrence of undernutrition. The red dots indicate higher values, while the blue dots indicate lower values. SHAP summary plot of the XGB model for stunting, wasting, and underweight was depicted in (S1–S3 Figs). The SHAP methods revealed that father education, wealth, mother education, BMI, birth interval, vitamin A, watching television, toilet facility, residence, and water source possess a higher SHAP value exceeding zero, thereby indicating that they are the influential predictors of stunting (S1 Fig). While, BMI, mother education, and BCG (S2 Fig) are influential predictors of wasting; and father education, wealth, mother education, BMI, birth interval, toilet facility, breastfeeding, birth order, and residence (S3 Fig) are the influential predictors of underweight.
4. Discussion
Nutrition is crucial for maintaining good health and promoting the growth and well-being of the human body at every stage of life. Severe malnutrition can lead to life-threatening consequences such as inhibited growth, impaired immune systems, and even death. Thus, this study highlighted the usefulness of several ML algorithms utilizing the most recent BDHS, 2017–2018 data to explore an appropriate explainable model that predicts the risk of undernutrition among children under five and determines the predictors that influence it. For each undernutrition indicator, four widely used ML-based algorithms were trained using the important predictors obtained by the Boruta method. The models’ performance was evaluated through accuracy, precision, recall, F-score, and ROC curve with AUC value. Based on the performance metrics, the XGB-based model was found superior to others for predicting the risk of undernutrition. The latest study conducted by Anku in Ghana, demonstrated that the XGB model was the best performer in predicting undernutrition among under-five children [51]. Other investigations also reported that the XGB-based model was the most precise for predicting undernutrition among children under five [52, 53]. The superiority of the XGB model may be due to its operation within the gradient boosting framework, which sequentially adds weak learners (typically DTs) and iteratively corrects errors by the preceding weak learners to achieve accurate prediction and it has the capability to effectively handle high-dimensional and complex data for classification [25]. However, the SHAP method in the XGB-based model reveals that the predictors of undernutrition vary across the three different indicators. Nevertheless, mother education and BMI are the coexistent predictors across three indicators of stunting, wasting, and underweight. This result is in line with the most recent research carried out in different nations [14, 54–57]. A mother who has received education may have a better awareness of the nutritional needs of her children. Better child feeding techniques, such as introducing supplementary foods to infants on time and exclusively breastfeeding during the first six months of a newborn’s life, are strongly linked to a decreased incidence of undernutrition in children [52]. Furthermore, mothers with higher levels of education are more likely to employ family planning [16], to use resources for the family effectively [19, 53], and to improve their children’s access to healthcare [25, 53]. The growth and development of a child greatly depend on the nutritional status of his/her mother. Mothers who are underweight face a much greater risk of stunting and wasting in comparison to mothers who have a normal weight [58]. Children of mothers with normal or above BMI have a lower risk of being underweight. Therefore, policymakers should prioritize the nutritional status of children to reduce malnutrition among them effectively. The coexistent predictors of stunting and underweight are the father’s education, wealth, birth interval, toilet facility, and residence. The socioeconomic status of the family influences the growth and development of the child, as well as their access to food security. Children from low-income families have more difficulty accessing food and medical care, which increases their risk of illness and death. This study demonstrated that the risk of stunting and underweight was highest in the poorest households, which coincided with recent research from neighboring countries including Bangladesh [56, 59, 60]. Birth spacing also has an impact on the nutritional status of under-five children. A lengthy gap between births is beneficial for the health and nutrition of both mothers and children, which was corroborated with the previous studies [61, 62]. Children living in rural areas with poor sanitation are more likely to experience stunting and being underweight. Improving access to clean and safe toilet facilities, along with promoting proper sanitation and hygiene practices, is essential for preventing childhood undernutrition and promoting overall health and well-being. Additionally, vitamin A, watching television, and water sources are also the influencing predictors of stunting, BCG of wasting, breastfeeding, and birth order of underweight. These findings are aligned with the earlier studies [63, 64]. Insufficient levels of vitamin A can impact various elements of growth and maturation, such as cellular growth, immune response, skeletal development, and hormonal equilibrium, all of which play a key role in the hindered growth of children [65]. The first-born siblings were prone to nurturing a deep sense of responsibility, the middle siblings a hunger for attention, and the youngest siblings a thirst for adventure and rebellion [66].
5. Conclusion
This study utilized four different ML-based algorithms to explore an appropriate explainable predictive model for the prediction of undernutrition among under-five Bangladeshi children. The comprehensive findings from our experiments indicate that, out of the four models, the XGB model is the most appropriate for predicting children with undernutrition. The SHAP method reveals that father education, wealth, mother education, BMI, birth interval, vitamin A, watching television, toilet facility, residence, and water source are the influential predictors of stunting among under-five children in Bangladesh. While, BMI, mother education, and BCG of wasting; and father education, wealth, mother education, BMI, birth interval, toilet facility, breastfeeding, birth order, and residence of underweight. The proposed integrating framework may be used to create an automated tool in clinical settings that correctly detect children who are undernourished in their early stages. With the help of this information, a healthcare provider can make proper decisions and formulate patient-specific treatment plans to mitigate wait times and healthcare expenses. Ultimately, our research may greatly enhance the care of undernourished children and assist decision-makers in taking appropriate initiatives to fulfill the Sustainable Development Goal (SDG) of decreasing pediatric undernutrition in Bangladesh by 2030.
5.1. Limitations of the study
This study is cross-sectional in nature, thereby restricting our capacity to establish causal relationships. While investigating several plausible factors, the data was absent in some other significant predictors, such as poor consumption of vitamin supplements, not up-to-date immunization, and so on. The important predictors of undernutrition will aid in obtaining precise results and enhanced model interpretability.
Supporting information
S1 Fig. SHAP summary plot of the XGB model for stunting.
https://doi.org/10.1371/journal.pone.0315393.s001
(TIF)
S2 Fig. SHAP summary plot of the XGB model for wasting.
https://doi.org/10.1371/journal.pone.0315393.s002
(TIF)
S3 Fig. SHAP summary plot of the XGB model for underweight.
https://doi.org/10.1371/journal.pone.0315393.s003
(TIF)
Acknowledgments
This study analyzed the dataset obtained from the Bangladesh Demographic and Health Survey (BDHS), 2017–18. The authors are thankful to the DHS Program for granting access to BDHS data. Also, the authors would like to thank the editor and the two anonymous reviewers for providing valuable comments and suggestions on the earlier version of the manuscript.
References
- 1.
Ersado TL. Causes of malnutrition. In Combating Malnutrition through Sustainable Approaches 2022. IntechOpen.
- 2. Scrinis G. Reframing malnutrition in all its forms: a critique of the tripartite classification of malnutrition. Global Food Security. 2020;26:100396.
- 3. Grey K, Gonzales GB, Abera M, Lelijveld N, Thompson D, Berhane M, et al. Severe malnutrition or famine exposure in childhood and cardiometabolic non-communicable disease later in life: a systematic review. BMJ global health. 2021;6(3):e003161. pmid:33692144
- 4. Soliman A, De Sanctis V, Alaaraj N, Ahmed S, Alyafei F, Hamed N, et al. Early and long-term consequences of nutritional stunting: from childhood to adulthood. Acta Bio Medica: Atenei Parmensis. 2021;92(1). pmid:33682846
- 5. Cerf ME. Healthy lifestyles and noncommunicable diseases: nutrition, the life‐course, and health promotion. Lifestyle Medicine. 2021;2(2):e31.
- 6. Morales F, Montserrat-de la Paz S, Leon MJ, Rivero-Pino F. Effects of Malnutrition on the Immune System and Infection and the Role of Nutritional Strategies Regarding Improvements in Children’s Health Status: A Literature Review. Nutrients. 2023;16(1):1. pmid:38201831
- 7. Dukhi N. Global prevalence of malnutrition: evidence from literature. Malnutrition. 2020;1:1–6.
- 8. Hossain S, Chowdhury PB, Biswas RK, Hossain MA. Malnutrition status of children under 5 years in Bangladesh: A sociodemographic assessment. Children and Youth Services Review. 2020;117:105291.
- 9. Rahman MT, Alam MJ, Ahmed N, Roy DC, Sultana P. Trend of risk and correlates of under-five child undernutrition in Bangladesh: an analysis based on Bangladesh Demographic and Health Survey data, 2007–2017/2018. BMJ open. 2023;13(6):e070480. pmid:37308267
- 10. Govender I, Rangiah S, Kaswa R, Nzaumvila D. Malnutrition in children under the age of 5 years in a primary health care setting. South African Family Practice. 2021;63(1).
- 11. Thangamani D, Sudha P. Identification of malnutrition with use of supervised data mining techniques–decision trees and artificial neural networks. Int J Eng Comput Sci. 2014;3(09).
- 12. Kuttiyapillai D, Ramachandran R. Improved text analysis approach for predicting effects of nutrient on human health using machine learning techniques. IOSR J Comput Eng. 2014;16(3):86–91.
- 13. Krishna PV, Gurumoorthy S, Obaidat MS, Mani JJ, Rani Kasireddy S. Population classification upon dietary data using machine learning techniques with IOT and big data. Social Network Forensics, Cyber Security, and Machine Learning. 2019:9–27.
- 14. Bitew FH, Sparks CS, Nyarko SH. Machine learning algorithms for predicting undernutrition among under-five children in Ethiopia. Public health nutrition. 2022;25(2):269–80. pmid:34620263
- 15. Fenta HM, Zewotir T, Muluneh EK. A machine learning classifier approach for identifying the determinants of under-five child undernutrition in Ethiopian administrative zones. BMC Medical Informatics and Decision Making. 2021;21:1–2.
- 16. Anku EK, Duah HO. Predicting and identifying factors associated with undernutrition among children under five years in Ghana using machine learning algorithms. Plos one. 2024;19(2):e0296625. pmid:38349921
- 17. Shen H, Zhao H, Jiang Y. Machine learning algorithms for predicting stunting among under-five children in Papua New Guinea. Children. 2023;10(10):1638. pmid:37892302
- 18. Chilyabanyama ON, Chilengi R, Simuyandi M, Chisenga CC, Chirwa M, Hamusonde K, et al. Performance of machine learning classifiers in classifying stunting among under-five children in Zambia. Children. 2022;9(7):1082. pmid:35884066
- 19. Talukder A, Ahammed B. Machine learning algorithms for predicting malnutrition among under-five children in Bangladesh. Nutrition. 2020;78:110861. pmid:32592978
- 20.
Shahriar MM, Iqubal MS, Mitra S, Das AK. A Deep Learning Approach to Predict Malnutrition Status of 0–59 Month’s Older Children in Bangladesh. In 2019 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT) 2019; (pp. 145–149). IEEE.
- 21. Rahman SJ, Ahmed NF, Abedin MM, Ahammed B, Ali M, Rahman MJ, et al. Investigate the risk factors of stunting, wasting, and underweight among under-five Bangladeshi children and its prediction based on machine learning approach. Plos one. 2021;16(6):e0253172. pmid:34138925
- 22.
National Institute of Population Research and Training (NIPORT), & ICF. Bangladesh demographic and health survey 2017‐18. Dhaka, Bangladesh, and Rockville, Maryland, USA.
- 23. Al-Sadeeq AH, Bukair AZ, Al-Saqladi AW. Assessment of undernutrition using Composite Index of Anthropometric Failure among children aged< 5 years in rural Yemen. Eastern Mediterranean Health Journal. 2018;24(12).
- 24. Kassie GW, Workie DL. Exploring the association of anthropometric indicators for under-five children in Ethiopia. BMC public health. 2019;19:1–6.
- 25. Khan S, Zaheer S, Safdar NF. Determinants of stunting, underweight and wasting among children< 5 years of age: evidence from 2012–2013 Pakistan demographic and health survey. BMC public health. 2019;19:1–5.
- 26. Wondiye K, Asseffa NA, Gemebo TD, Astawesegn FH. Predictors of undernutrition among the elderly in Sodo zuriya district Wolaita zone, Ethiopia. BMC nutrition. 2019;5:1–7.
- 27. Modjadji P, Madiba S. Childhood undernutrition and its predictors in a rural health and demographic surveillance system site in South Africa. International journal of environmental research and public health. 2019;16(17):3021. pmid:31438531
- 28. Chowdhury MR, Rahman MS, Billah B, Rashid M, Almroth M, Kader M. Prevalence and factors associated with severe undernutrition among under-5 children in Bangladesh, Pakistan, and Nepal: a comparative study using multilevel analysis. Scientific Reports. 2023;13(1):10183. pmid:37349482
- 29. Kiarie J, Karanja S, Busiri J, Mukami D, Kiilu C. The prevalence and associated factors of undernutrition among under-five children in South Sudan using the standardized monitoring and assessment of relief and transitions (SMART) methodology. BMC nutrition. 2021;7(1):25. pmid:34044874
- 30. Danso F, Appiah MA. Prevalence and associated factors influencing stunting and wasting among children of ages 1 to 5 years in Nkwanta South Municipality, Ghana. Nutrition. 2023;110:111996. pmid:37003173
- 31. Menalu MM, Bayleyegn AD, Tizazu MA, Amare NS. Assessment of prevalence and factors associated with malnutrition among under-five children in Debre Berhan town, Ethiopia. International Journal of General Medicine. 2021:1683–97. pmid:33976568
- 32. Hoffman DJ, Kassim I, Ndiaye B, McGovern ME, Le H, Abebe KT, et al. Childhood Stunting and wasting following independence in South Sudan. Food and Nutrition Bulletin. 2022;43(4):381–94. pmid:36245391
- 33. Kwak SK, Kim JH. Statistical data preparation: management of missing values and outliers. Korean journal of anesthesiology. 2017;70(4):407. pmid:28794835
- 34. Samuel O, Zewotir T, North D. Application of machine learning methods for predicting under-five mortality: analysis of Nigerian demographic health survey 2018 dataset. BMC Medical Informatics and Decision Making. 2024;24(1):86. pmid:38528495
- 35. Zhang Q, Wan NJ. Simple Method to Predict Insulin Resistance in Children Aged 6–12 Years by Using Machine Learning. Diabetes, Metabolic Syndrome and Obesity: Targets and Therapy. 2022:2963–75. pmid:36193541
- 36. Faisal S, Tutz G. Multiple imputation using nearest neighbor methods. Information Sciences. 2021;570:500–16.
- 37. Emmanuel T, Maupong T, Mpoeleng D, Semong T, Mphago B, Tabona O. A survey on missing data in machine learning. Journal of Big data. 2021;8:1–37.
- 38. Beretta L, Santaniello A. Nearest neighbor imputation algorithms: a critical evaluation. BMC medical informatics and decision making. 2016;16:197–208. pmid:27454392
- 39. Chen RC, Dewi C, Huang SW, Caraka RE. Selecting critical features for data classification based on machine learning methods. Journal of Big Data. 2020 Jul 23;7(1):52.
- 40. Islam MM, Alam MJ, Maniruzzaman M, Ahmed NF, Ali MS, Rahman MJ, et al. Predicting the risk of hypertension using machine learning algorithms: A cross sectional study in Ethiopia. PLoS One. 2023;18(8):e0289613. pmid:37616271
- 41. Ranganathan P, Pramesh CS, Aggarwal R. Common pitfalls in statistical analysis: logistic regression. Perspectives in clinical research. 2017;8(3):148–51. pmid:28828311
- 42.
Hassoun MH. Fundamentals of artificial neural networks. MIT press; 1995.
- 43. Breiman L. Random forests. Machine learning. 2001;45:5–32.
- 44.
Chen T, Guestrin C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining 2016; 785–794.
- 45. Buczak AL, Guven E. A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Communications surveys & tutorials. 2016;18(2):1153–1176.
- 46. Dahiya P, Srivastava DK. Intrusion detection system on big data using deep learning techniques. Int J Innov Technol Exploring Eng. 2020;9(4):3242–3247.
- 47.
Hagar AA, Gawali BW. Implementation of Machine and Deep Learning Algorithms for Intrusion Detection System. In G. Rajakumar et al. (eds.), Intelligent Communication Technologies and Virtual Mobile Networks. Springer Nature Singapore. 2023:1–20.
- 48. Hajian-Tilaki K. Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Caspian journal of internal medicine. 2013;4(2):627. pmid:24009950
- 49. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Advances in neural information processing systems. 2017;30.
- 50.
Kuhn HW, Tucker AW, editors. Contributions to the Theory of Games. Princeton University Press; 1953.
- 51. Steinfath M, Vogl S, Violet N, Schwarz F, Mielke H, Selhorst T, et al. Simple changes of individual studies can improve the reproducibility of the biomedical scientific process as a whole. PLoS One. 2018;13(9):e0202762. pmid:30208060
- 52. Antipov EA, Pokryshevskaya EB. Interpretable machine learning for demand modeling with high-dimensional data using Gradient Boosting Machines and Shapley values. Journal of revenue and pricing management. 2020;19:355–64.
- 53. Sumon IH, Hossain M, Ar Salan S, Kabir MA, Majumder AK. Determinants of coexisting forms of undernutrition among under‐five children: Evidence from the Bangladesh demographic and health surveys. Food Science & Nutrition. 2023;11(9):5258–69. pmid:37701232
- 54. Khan RE, Raza MA. Determinants of malnutrition in Indian children: new evidence from IDHS through CIAF. Quality & Quantity. 2016;50:299–316.
- 55. Akombi BJ, Agho KE, Merom D, Hall JJ, Renzaho AM. Multilevel analysis of factors associated with wasting and underweight among children under-five years in Nigeria. Nutrients. 2017;9(1):44. pmid:28075336
- 56. Vijay J, Patel KK. Malnutrition among under-five children in Nepal: A focus on socioeconomic status and maternal BMI. Clinical Epidemiology and Global Health. 2024;27:101571.
- 57. Ahmmed F, Hasan MN, Hossain MF, Khan MT, Rahman MM, Hussain MP, et al. Association between short birth spacing and child malnutrition in Bangladesh: a propensity score matching approach. BMJ Paediatrics Open. 2024;8(1). pmid:38499349
- 58. Ntambara J, Zhang W, Qiu A, Cheng Z, Chu M. Optimum birth interval (36–48 months) may reduce the risk of undernutrition in children: A meta-analysis. Frontiers in Nutrition. 2023;9:939747. pmid:36712519
- 59. Ssentongo P, Ba DM, Ssentongo AE, Fronterre C, Whalen A, Yang Y, et al. Association of vitamin A deficiency with early childhood stunting in Uganda: A population-based cross-sectional study. PloS one. 2020;15(5):e0233615. pmid:32470055
- 60. Das S, Gulshan J. Different forms of malnutrition among under five children in Bangladesh: a cross sectional study on prevalence and determinants. BMC Nutrition. 2017;3:1–2.
- 61. Hossain MM, Yeasmin S, Abdulla F, Rahman A. Rural-urban determinants of vitamin a deficiency among under 5 children in Bangladesh: Evidence from National Survey 2017–18. BMC Public Health. 2021;21:1–0.
- 62. Ahmed R, Ejeta Chibsa S, Hussen MA, Bayisa K, Tefera Kefeni B, Gezimu W, et al. Undernutrition among exclusive breastfeeding mothers and its associated factors in Southwest Ethiopia: A community-based study. Women’s Health. 2024;20:17455057241231478.
- 63. Hossain MM, Abdulla F, Rahman A. Prevalence and risk factors of underweight among under-5 children in Bangladesh: Evidence from a countrywide cross-sectional study. PLoS One. 2023;18(4):e0284797. pmid:37093817
- 64. Chandna A, Bhagowalia P. Birth order and children’s health and learning outcomes in India. Economics & Human Biology. 2024;52:101348. pmid:38237431
- 65. Mutumba R, Pesu H, Mbabazi J, Greibe E, Olsen MF, Briend A, et al. Correlates of iron, cobalamin, folate, and vitamin A status among stunted children: A cross-sectional study in Uganda. Nutrients. 2023;15(15):3429. pmid:37571364
- 66. Yu T, Chen C, Jin Z, Yang Y, Jiang Y, Hong L, et al. Association of number of siblings, birth order, and thinness in 3-to 12-year-old children: a population-based cross-sectional study in Shanghai, China. BMC Pediatrics. 2020;20:1–3.