Figures
Abstract
Aim
Malnutrition in pregnant women significantly affects both mother and child health. This research aims to identify the best machine learning (ML) techniques for predicting the nutritional status of pregnant women in Bangladesh and detect the most essential features based on the best-performed algorithm.
Methods
This study used retrospective cross-sectional data from the Bangladeshi Demographic and Health Survey 2017–18. Different feature transformations and machine learning classifiers were applied to find the best transformation and classification model.
Results
This investigation found that robust scaling outperformed all feature transformation methods. The result shows that the Random Forest algorithm with robust scaling outperforms all other machine learning algorithms with 74.75% accuracy, 57.91% kappa statistics, 73.36% precision, 73.08% recall, and 73.09% f1 score. In addition, the Random Forest algorithm had the highest precision (76.76%) and f1 score (71.71%) for predicting the underweight class, as well as an expected precision of 82.01% and f1 score of 83.78% for the overweight/obese class when compared to other algorithms with a robust scaling method. The respondent’s age, wealth index, region, husband’s education level, husband’s age, and occupation were crucial features for predicting the nutritional status of pregnant women in Bangladesh.
Citation: Begum N, Rahman MM, Omar Faruk M (2024) Machine learning prediction of nutritional status among pregnant women in Bangladesh: Evidence from Bangladesh demographic and health survey 2017–18. PLoS ONE 19(5): e0304389. https://doi.org/10.1371/journal.pone.0304389
Editor: Benojir Ahammed, Khulna University, BANGLADESH
Received: January 24, 2024; Accepted: May 12, 2024; Published: May 31, 2024
Copyright: © 2024 Begum et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Data and Code availability Statement: The code and data of this study can be found online at: https://www.kaggle.com/datasets/faruk268/nutritional-status-of-the-pregnant-women.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Nutritional status is the outcome of the biological phenomenon of food utilization and is a vital aspect of health. Good nutrition is linked to better health outcomes in infants, children, and mothers, more robust immune systems, reduced risk of non-communicable diseases (like diabetes and cardiovascular disease), and safer pregnancies and childbirths [1]. In contrast, malnutrition can lead to a range of issues, including low work productivity, higher chances of miscarriage, stillbirth, low birth weight, infant mortality, and fatal complications during pregnancy, delivery, and postpartum periods [2]. Malnutrition poses significant health risks. Globally, nearly 1.9 billion adults are either overweight or obese, and approximately 462 million adults are underweight [3]. Regarding nutrition, Bangladesh is experiencing a decline in underweight individuals but an upward trend in overweight and obese individuals [3]. Rural-urban disparities in unhealthy body mass index (BMI) categories are also a significant concern. According to the Bangladesh Demographic and Health Survey (BDHS) from 2017–2018, 13% of rural women are underweight, while 9% of urban women are underweight. In contrast, 43% of urban women are overweight or obese, compared to 28% of rural women who are obese [3, 4].
Approximately 200 million women become pregnant yearly, most residing in developing countries [5]. The nutrition of mothers during pregnancy is critical for the short- and long-term health of both the mother and her growing fetus [6]. A healthy pregnancy outcome is contingent upon good nutritional status before and during pregnancy. Maternal malnutrition poses significant health risks for both the pregnant mother and her children [7]. In Ethiopia, pregnant women’s undernutrition ranges from 21.8 to 43.1%. Rural women exhibit a higher prevalence of undernutrition [8, 9]. Malnutrition in pregnant mothers often goes unnoticed and unreported, resulting in insufficient attention given to the extent, consequences, and causes of this health issue [10]. Extensive research has been conducted on malnutrition’s impact on pregnant women’s health. Numerous factors contribute to malnutrition, including demographic, household, physical, socioeconomic, and cultural factors [11]. Previous studies have shown that individuals with a lower wealth index and less education are at a higher risk of being underweight, but the threat of being overweight is lower [12].
Machine learning is an intersection of artificial intelligence and statistical learning that explores large data sets to uncover unknown patterns or relationships [13]. Various studies have been conducted to identify the most informative risk factors and predict nutritional status using machine learning models, such as child malnutrition [14, 15] and malnutrition among women [2, 3], based on different demographic and health survey (DHS) datasets. Islam et al. (2022) utilized the Bangladesh Demographic Health and Survey (BDHS) 2014 dataset with 15,464 respondents and employed five different algorithms–NB, DT, SVM, ANN, and RF–to predict malnourished women. The RF classifier was found to have the highest accuracy (81.4%) and AUC (0.837) for underweight and accuracy (82.4%) and AUC (0.853) for overweight/obese [2]. Moreover, Mukuku et al. (2019) conducted a cross-sectional study with 263 children and employed an LR-based algorithm to predict nutritional status, revealing an AUC of 0.969, sensitivity of 93.5%, and specificity of 93.1% [16]. Another study by Hossain et al. (2022) applied six different machine learning algorithms to predict unintended pregnancies among married women in Bangladesh using the pregnancy intention of 1129 respondents. Among them, the elastic net regression (ENR) algorithm gained a higher AUC of 74.67% [17].
Researchers have recently used various machine-learning algorithms to study prediction performance [18]. All in all, machine learning is now being used everywhere in the research sector. Nowadays, machine learning is prevalent in health-related fields [13, 14, 19]. However, no research has considered machine learning algorithms to evaluate the nutritional status of pregnant women. The main objective of this study was to use various well-known machine learning algorithms to predict the nutritional status of pregnant women in Bangladesh and to identify the critical features of the best model with more accurate prediction.
Methodology
Data source and sampling design
The nutritional status of currently pregnant women data was extracted from the Bangladesh Demographic and Health Survey (BDHS), conducted in 2017–18, which is accessible online [4]. This study only included women currently pregnant and excluded all women who did not fall into the inclusion criteria. BDHS 2017–18 data comprise 20,127 ever-married women aged 15–49 who were interviewed. Among them, 18,895 were married women. However, only 1,129 currently pregnant women were included in this study. The purpose of BDHS was to collect household data to monitor and evaluate the health status of mothers and children, including nutrition, causes of death, newborn care, women’s empowerment, and more. United States Agency for International Development (USAID) provided financial assistance for this investigation in Bangladesh. Demographic Health Survey Authority employed a two-step stratified sampling procedure in the 2017–18 BDHS, where data was collected from eight divisions: Barisal, Chattogram, Dhaka, Khulna, Mymensingh, Rajshahi, Rangpur, and Sylhet. The survey used a list of enumerated areas (EAs) from Bangladesh’s population and housing census 2011 provided by the Bangladesh Statistics Office (BBS). In the 1st sampling stage, 675 Eas were selected, of which 425 were from rural areas and 250 were from urban areas, with a probability proportional to the EA scale. In the second sampling stage, a complete household listing procedure was carried out in all selected Eas to provide a sampling frame for the systematic selection of 30 households per EA. This allowed for statistically accurate estimates of key demographic and health variables for the nation, rural, and urban areas separately [4].
Study variables and measurement
Dependent variable.
The study primarily focused on assessing the nutritional status of pregnant women by utilizing the body mass index (BMI) as a measure. According to the World Health Organization (WHO), BMI was categorized as underweight (BMI<18.5 kg/m2), normal weight (18.5≤BMI≤24.9 kg/m2), overweight (25.0≤ BMI<30.0 kg/m2), and obese (BMI≥30.0 kg/m2) [20]. However, for this study, overweight and obese women were classified as a single category.
This study tried to recommend the nutritional status of pregnant women. To begin with, we consider the current weight of the pregnant women. During normal pregnancy, women gain weight 11.5–16 kg, and it is essential to note that usual weight gain (UWG) is not affected by the height of the pregnant women [21, 22]. To calculate the pre-pregnancy weight of pregnant women, we have deduced the usual weight gain (UWG) during pregnancy from their current weight.
In the first trimester (1–13 weeks), the UWG is 0.5–2 kg and 0.35–0.5 kg/week for the second and third trimesters [21–25]. UWG for the first trimester is 0.5kg, with little weight gain experienced during the first trimester [21, 26]. In the second trimester, we divided it into two equal parts: the first half (14–20 weeks) and the second half (21–27 weeks). It is mentioned that the weight gain is typically lower during the first half of the second trimester [27]. Therefore, we consider UWG for this part to be 0.35 kg/week. It is also mentioned that the weight gain in the second half of the second trimester is comparatively higher than in the first half [23]. So, we consider the UWG 0.5kg/week for this part. In the third trimester, UWG is also 0.5kg/week [28].
So, at a glance, UWG during the first trimester or first three months is 0.5kg, and up to 20 weeks of gestation or during the first five months (0.5+0.35*7), it’s approximately 3 kg. For the subsequent months, it’s considered 2 kg per month. Using this information, we can calculate the pre-pregnancy weight of the pregnant women = current weight—UWG during pregnancy.
For example, if a pregnant woman is seven months into her pregnancy and her current weight is 80kg, and according to our discussion, UWG of 7kg, her pre-pregnancy weight would be 73kg. BIM calculated for this pregnant woman is = (73kg/ height in meters squared). Calculate the BMI for each respondent in the study using the formula (pre-pregnancy weight in kg/height in meters squared), which categorizes as underweight (<18.5 kg/m2), normal weight (18.5≤BMI≤24.9 kg/m2), and overweight (≥25.0 kg/m2). Previous literature conducted in Asian countries has used the BMI categories recommended by the World Health Organization (WHO) [2, 29–33]. Following this literature, we have used the WHO-recommended BMI categories in our analysis.
Independent variable.
Table 1 presents the predictor names, types, descriptions, and categorizations based on previous relevant works [2, 3, 14, 15] The predictors included the respondent’s age, place of residence, region, religion, educational attainments, current employment status, wealth index, total number of children, number of living children, current pregnancy wanted, currently breastfeeding, access to mass media, age, occupation, and educational attainment of the partner, toilet facility, and sources of drinking water.
Data pre-processing
The BDHS 2017–18 data has been used for this study. First, reviewing the literature, we made a list of variables and extracted the selected variables from the BDHS data. Most of the features considered in this study were categorical, and a few were numeric. The numeric variable was also converted into categorical for the convenience of the study. Before model training, an extensive exploratory analysis was conducted. The categorical features of the dataset were encoded for numerical values. First, all variables’ frequency was calculated to check anomalies such as inconsistent values, missing observations, and outliers. The conflicting values were removed or replaced with consistent values. The missing values and outliers were deleted from the dataset.
Feature selection.
We proceeded with variable or feature selection after removing any missing values. Variable selection aims to reduce data dimensions to minimize processing time and computation costs [34]. To enhance the overall predictive performance of the classification, we chose a subset of variables that significantly contributed to the target class. Identifying these by performing the chi-square (χ2) test between nutritional status (BMI) with each of the variables primary, which was adjusted for the complex survey design using second-order Rao–Scott corrections [35, 36]. And included those with a p-value < 0.05. Thirteen features met these criteria and were selected for developing the classification model. These features included the respondent’s age, region, place of residence, highest educational level, wealth index, total children ever born, number of living children, current pregnancy wanted, access to mass media, husband’s age, husband’s education level, husband’s occupation, and toilet facility. The S1 Table shows the features list from the chi-square test results, adjusted using second-order Rao–Scott corrections.
Dealing with imbalanced datasets.
In this study on the BMI data of pregnant women, we noted a class imbalance, which could result in inaccurate or biased estimates of measures such as accuracy and precision. The percentage of overweight/obese pregnant women in Bangladesh was 15%, which may create an imbalanced distribution of the underlying classes and lead to biased and unreliable results while using ML. To overcome this issue, an oversampling approach named the Synthetic Minority Oversampling Technique (SMOTE) was implemented. This technique was developed by Nitesh Chawla [37].
Model validation.
For ML approaches, the dataset is randomly divided into two distinct datasets: a training dataset that comprises 70% of the data and a test dataset that predicts the response variable and checks whether the expected outcome is similar to the actual outcomes, which include 30% of the primary dataset. All models were trained based on 10-fold cross-validation, designed to assess performance and optimize prediction models using ML techniques. The Statistical Package for Social Science (SPSS) 26 version and Python version 3.9.13 were used for data management and analysis.
Feature transformation (FT)
Four feature selection techniques were applied to decrease the datasets’ spread equality, skewness, and linear and additive relationships (see details in Table 2). From these transformations, we evaluated the best one for which the best ML model can be extracted. The transformations we applied are Standardization, Min-Max Scaling, Log Scaling, and Robus Scaling. A brief description of the transformation has been presented in Table 2.
Machine learning algorithms
This research utilized ten machine learning algorithms to predict the nutritional status (underweight, normal weight, overweight/obese) of pregnant women in Bangladesh. The performance of these algorithms was evaluated based on model evaluation parameters. The ML algorithms used in this investigation include logistic regression (LR), decision trees (DT), random forest (RF), k-nearest neighbors (k-NN), support vector machine (SVM), Naïve Bayes (NB), adaptive boosting (ADB), extreme gradient boosting (XGB), gradient boost, and bagging were included in this analysis. A brief description of ML algorithms used in this study is provided in supplement A in the S1 File.
Performance evaluation
Research supports using a variety of measures to assess and summarize a model’s performance, as no single measure can fully capture all aspects of a model. Methods such as accuracy, f1 score, precision, recall (sensitivity), and the area under the receiver operating characteristic curve should be employed to evaluate a model. Supplement B in the S1 File will discuss each performance evaluation parameter.
Feature importance
Identifying important features is crucial to machine learning prediction. Feature importance rates illustrate the significance of each feature for decision-making purposes. We have utilized two distinct feature importance methods, namely (a) Mean Decrease Impurity (MDI) and (b) Permutation Importance (PI), to identify the significant features from the datasets. After analyzing these datasets, we determined the algorithm that yielded the best results.
Ethical approval
BDHS 2017–2018 provided the publicly available secondary data for this study, which was conducted with ethical approval from the Institutional Review Boards of ICF Macro in Calverton, MD, USA, and Bangladesh Medical Research Council. All participants were informed of the study’s purpose, risks and benefits, future use of data, confidentiality, and anonymity, and they provided informed consent. We removed all identifier information before downloading the data from the BDHS website [4].
Results
Baseline characteristics
S2 Table depicts the background characteristics of the pregnant women participating in this study. The most significant percentage of respondents are from the Chittagong and Dhaka divisions (15.4%) and (15.3%), respectively. The highest proportion of mothers belongs to the 20–24 age group, accounting for 34.4%, while most of the respondents’ husbands aged between 20–30 years (77%), but there were still some pregnant women who were less than 20 years old and over 35 years. Besides, (48.5%) of pregnant women are in secondary education, and only (4.3%) of the respondents could not read and write. The education level of the partners of the respondents is distributed as follows: 12.6% have no education, 33.2% have primary education, 35.3% have secondary education, and 19% have higher education. The majority of pregnant women (67.2%) are not currently working, and most of the respondents’ husbands (35.4%) work as employees. Large numbers of pregnant women (50.5%) had 1–2 children. Most participants come from poor and rich wealth statuses (approximately 20% each), with only 18.8% belonging to middle-class families. Most pregnant women (64.7%) were involved in mass media, and 35.3% were not. 75% of women’s pregnancies had Intended, and only 6% were breastfeeding. Most of the respondents were from rural areas (64.6%) and had improved drinking water sources at home.
Machine learning algorithm specifications
This study used specific machine learning algorithms, summarized in Table 3. To help prevent errors, 10-fold cross-validation was used to determine the best parameters for these algorithms.
Machine learning algorithms performance evaluation
This study applied four different feature transformation (FT) methods—Standardization, Min-Max, log, and Robust (referred to as FT1, FT2, FT3, and FT4)—along with ten machine learning (ML) algorithms to classify the nutritional status of pregnant women. The algorithms were evaluated based on various performance parameters, including accuracy, kappa statistics, precision, recall, f1 score, and AUC value. Tables 4–8 present each algorithm’s classification accuracy, kappa statistics, precision, f1 score, and recall. Tables 9–12 also show prediction results for underweight and overweight/obese classes, including AUC, precision, f1 score, and recall. The study also evaluated the performance of these ML algorithms without any transformation techniques, and the results showed that using FT methods improved the accuracy of the classification and other performance parameters. These results are reported in S3 Table.
Variable importance from best performing algorithm
After evaluating machine learning, two different feature importance approaches, such as MDI and PI, were implemented for the RF algorithm with robust scaling to utilize and rank the significant features of the datasets. The factors, including respondent’s current age, wealth index, region, husband’s education level, husband’s age, and occupation, were the most important features of the nutritional status of pregnant women. In contrast, variables such as total number of children ever born, religion, number of living children, and toilet facility were found to be the least predictive based on the all-features importance methods (Fig 5). S4 Table represents the important features rank of robust transformed datasets for the RF algorithm.
Discussion
To date, many prediction models can identify the nutritional status of children and women in Bangladesh [2, 3, 13]. However, there is a lack of research on the potential use of machine learning techniques to predict the nutritional status of pregnant women in Bangladesh. The main aim of this study is to predict the nutritional status (underweight, overweight/obese) of currently pregnant women in Bangladesh. This study applied four different feature transformation (FT) methods and then ten well-known machine learning algorithms such as decision tree, logistic regression, random forest, support vector machine, k-nearest neighbor, naïve Bayes, adaptive boosting (ADB), eXtreme Gradient Boosting (XGB), gradient boost and bagging. All models were trained using 10-fold cross-validation on the training data set.
The results of this study revealed that the FT4 or robust transformation is the best in the case of pregnant women’s nutritional status as it achieved the highest performance parameter for all classifiers. RF algorithm gained the highest accuracy (74.75%), kappa statistics (57.91%), precision (73.36%), recall (73.08%), and f1 score (73.09%) among all algorithms applied in the investigation with FT4 or robust scaling. The RF classifier had a high precision of 76.76% and an f1 score of 71.71% for the underweight class, while for the overweight class, the precision was 82.01%, and the f1 score was 83.78%.
In a study conducted by Balabaeva et al. [41], the impact of various feature scaling methods on heart failure patient datasets was examined and used LR, XGB, DT, and RF algorithms with scaling methods such as Standard Scaler, Max Abs Scaler, MinMax Scaler, Quantile Transformer, and Robust scaler. The study found that RF demonstrated better performance with Standard Scaler and Robust Scaler, which is consistent with our findings. A study conducted by M. Ahsan on a dataset with heart disease patients to evaluate eleven machine learning (ML) algorithms and six different data scaling methods such as Normalization, MinMax, Standscale, MaxAbs, Quantile Transformer, and Robust Scaler and gained that CART algorithm, along with Quantile Transformer, or Robust Scaler, outperforms all other ML algorithms [40].
Islam et al. discovered that the RF algorithm has the best prediction accuracy and the highest AUC score compared to other machine learning algorithms for health issues, including women’s nutritional status [2]. Khudri et al. conducted a study that found the ADB, RF, and XGB algorithms were the most effective at predicting women of childbearing age’s nutritional status [3], supporting this study’s findings. J. Ali et al. [42] developed a nutritional prediction model for Pakistani women using a Support Vector Machine, Logistic Regression, Random Forest, K-nearest neighbor, and Naïve Bayes algorithms. They found that Random Forest had the highest accuracy. B. Alamma et al. [43] used Random Forest (RF) and Decision Tree (DT) classifiers to analyze risk factors for obesity and overweight women in their research. They found that the Random Forest algorithm produced the best results with an accuracy and f1 score of 77% and 75%, respectively. Dunstan conducted a study on predicting nationwide obesity from food sales and found that RF had the best performance, which supports the findings of this study [44]. Talukder and Ahammed applied RF, LR, SVM, k-NN, and LDA algorithms to predict malnutrition in under-five Bangladeshi children. They found that the RF algorithm performed the best, with a specificity of 69.76%, sensitivity of 94.66%, and accuracy of 68.51% [14]. Another study to predict under-five malnutrition in Bangladesh, conducted by S. Ahmed et al., showed the best performance by the RF algorithm with an accuracy of 70.1% and 72.4% and AUC of 69.8% and 70% for stunting and underweight, respectively [45]. In a recent study on predicting childhood anemia in Bangladesh, Khan and colleagues showed that the RF algorithm achieved a height accuracy of 68.53% with a specificity of 66.41%, sensitivity of 70.73%, and AUC of 0.6857 [13]. Rahman et al. discovered that the RF algorithm achieved the highest AUC of 0.6590, accuracy of 0.8890, specificity of 0.9789, sensitivity of 0.0480, f1 score of 0.0771, and precision of 0.1960 for infant mortality in Bangladesh compared to other algorithms [46]. The Random Forest algorithm outperformed all other algorithms (total accuracy: 95%; area under ROC curve: 93%; Kappa Coefficient: 66%) in Ahmadi’s study on predicting low birth weight [47]. S. Rahman et al. [48] implemented three ML classifiers (support vector machine, LR, and random forest) to predict malnutrition in children. They achieved the maximum accuracy of 87.7% for wasted, 88.3% for stunted, and 85.7% for underweight, obtained by the RF algorithm. Random Forest performed better than other algorithms in Chilyabanyama’s research on predicting stunting among children under five in Zambia, which supports the current investigation’s findings [49].
In addition to identifying the best predictive models, this study also determined the essential features predicting nutritional status among currently pregnant women in Bangladesh based on the best algorithm found in this study. Based on the important feature score for RF, algorithms suggested that the wealth index, respondent age, region, husband education level, and husband’s age and occupation are the six most important features for predicting the nutritional status of pregnant women in Bangladesh. Household wealth status is a significant factor in determining maternal health care. As per the findings of this study, mothers with poor socioeconomic status face a greater risk of being underweight than those with high socioeconomic status, which is consistent with a previous study [50]. This research aligns with previous studies that have linked wealth index and working women to maternal underweight and overweight/obesity [51]. Respondent age is a vital indicator of the nutritional status of pregnant women. Some previous studies revealed that respondent age during the third trimester of pregnancy is a risk factor for developing malnutrition [52, 53]. The husband’s age is also an important feature in the nutritional status of pregnant women. A study found that being overweight is more prevalent among women whose husbands are aged 31 years or above (29%) [3]. The current study also revealed that pregnant women whose husband’s education level is a significant factor related to nutrition, which is consistent with the former studies done by M. Fite [54] and Hossain [32]. According to a study conducted in a rural area of Assam, India, it was observed that the incidence of malnutrition among pregnant women was significantly associated with the occupation of their husbands. The study reported a strong positive relationship between BMI and the husband’s occupation, which supports this study’s results [55]. Another study by M. Fite et al. showed that pregnant women’s nutritional status and dietary practices can significantly impact their husbands’ occupation [54]. According to a study of women living in Bangladesh, the location of residence is the most important factor in pregnant women’s health status. This study’s findings align with previous research conducted by M. Islam [2] and another nationally representative study, which used BDHS data [56].
Despite their usefulness, ML models may have limitations, such as not providing odds ratios or coefficients to indicate the direction of the relationship between important features. Knowing the direction of the association of each feature’s importance would improve the design and implementation of interventions to prevent malnutrition among pregnant women in Bangladesh.
Strengths and limitations
It is important to note that the study has limitations as it relies on cross-sectional data, which restricts its ability to access supplementary information about other related factors. However, it has been suggested that by combining these factors, the predictive power and AUC of the algorithms could potentially increase. Another limitation is that the study’s analysis did not adjust for the sampling weight. Despite these limitations, the study’s strength lies in identifying the best ML algorithm using various performance evaluation techniques, which is a significant contribution to the field of research.
Conclusions
Malnutrition is a significant concern for the health of developing nations. This paper aims to conduct a comprehensive study that compares and assesses the effectiveness of various machine learning (ML) algorithms in predicting the nutritional status of pregnant women in Bangladesh. To summarize, we applied FT methods to the datasets and utilized various algorithms to analyze the transformed data and evaluate their performance. The best performance was found in this study of the RF algorithm for a robust scaling method. According to the RF algorithm, the most important features that determine the nutritional status of pregnant women in Bangladesh are the respondent’s age, wealth index, region, husband’s education level, husband’s age, and occupation. This research will assist healthcare providers and policymakers develop a framework for implementing necessary interventions and care practices to prevent severe complications and reduce the burden of nutritional status concerns.
Supporting information
S1 Table. Association between pregnant women’s nutritional status (BMI) and demographic and socio-economic characteristics.
https://doi.org/10.1371/journal.pone.0304389.s001
(DOCX)
S2 Table. Background characteristics of the pregnant women in Bangladesh.
https://doi.org/10.1371/journal.pone.0304389.s002
(DOCX)
S3 Table. Evaluation of prediction performance (%) of different ML Algorithms for Overall nutritional status, underweight, and overweight/Obese without any FT methods.
https://doi.org/10.1371/journal.pone.0304389.s003
(DOCX)
S4 Table. Feature importance ranking for the best-performing algorithm.
https://doi.org/10.1371/journal.pone.0304389.s004
(DOCX)
Acknowledgments
The authors thank the Demographic Health Survey for allowing us to use open-access data for their study.
References
- 1.
WHO. The double burden of malnutrition: policy brief. 2016. https://apps.who.int/iris/handle/10665/255413
- 2. Islam MM, Rahman MJ, Islam MM, Roy DC, Ahmed NAMF, Hussain S, et al. Application of machine learning-based algorithm for prediction of malnutrition among women in Bangladesh. International Journal of Cognitive Computing in Engineering. 2022;3: 46–57.
- 3. Khudri MM, Rhee KK, Hasan MS, Ahsan KZ. Predicting nutritional status for women of childbearing age from their economic, health, and demographic features: A supervised machine learning approach. Ahmad T, editor. PLOS ONE. 2023;18: e0277738. pmid:37172042
- 4.
National Institute of Population Research and Training (NIPORT), and ICF. 2019. Bangladesh Demo- graphic and Health Survey 2017–18: Key Indicators. Dhaka, Bangladesh, and Rockville, Maryland, USA: NIPORT, and ICF.—Google Search. [cited 9 Jul 2023]. https://www.google.com/search?client=firefox-b-d&q=National+Institute+of+Population+Research+and+Training+%28NIPORT%29%2C+and+ICF.+2019.+Bangladesh+Demo-+graphic+and+Health+Survey+2017-18%3A+Key+Indicators.+Dhaka%2C+Bangladesh%2C+and+Rockville%2C+Maryland%2C+USA%3A+NIPORT%2C+and+ICF.
- 5. K DR, Author C. Assessment of Nutritional Status in Pregnant Women. International Journal of Health Sciences and Research. 2020. Available: www.ijhsr.org
- 6. Bhanbhro S, Kamal T, Diyo RW, Lipoeto NI, Soltani H. Factors affecting maternal nutrition and health: A qualitative study in a matrilineal community in Indonesia. PLoS ONE. 2020;15. pmid:32544180
- 7. Lama N, Lamichhane R, S K. C., Bhandari GP, Wagle RR. Determinants of nutritional status of pregnant women attending antenatal care in Western Regional Hospital, Nepal. International Journal Of Community Medicine And Public Health. 2018;5: 5045.
- 8. Muze M, Yesse M, Kedir S, Mustefa A. Prevalence and associated factors of undernutrition among pregnant women visiting ANC clinics in Silte zone, Southern Ethiopia. BMC Pregnancy and Childbirth. 2020;20: 1–9. pmid:33213406
- 9. Wakwoya EB, Belachew T, Girma T. Determinants of nutritional status among pregnant women in East Shoa zone, Central Ethiopia. Frontiers in Nutrition. 2022;9. pmid:36590215
- 10. Khanum H. Malnutrition and Associated Disorders Among Pregnant Women in Keranigonj, Bangladesh. Biomedical Journal of Scientific & Technical Research. 2021;36.
- 11. Khare S, Kavyashree S, Gupta D, Jyotishi A. Investigation of Nutritional Status of Children based on Machine Learning Techniques using Indian Demographic and Health Survey Data. Procedia Computer Science. Elsevier B.V.; 2017. pp. 338–349.
- 12. Biswas T, Garnett SP, Pervin S, Rawal LB. The prevalence of underweight, overweight and obesity in Bangladeshi adults: Data from a national survey. PLoS ONE. 2017;12. pmid:28510585
- 13. Khan JR, Chowdhury S, Islam H, Raheem E. Machine Learning Algorithms To Predict The Childhood Anemia In Bangladesh. Journal of Data Science. 2021;17: 195–218.
- 14. Talukder A, Ahammed B. Machine learning algorithms for predicting malnutrition among under-five children in Bangladesh. Nutrition. 2020;78: 110861. pmid:32592978
- 15. Fenta HM, Zewotir T, Muluneh EK. A machine learning classifier approach for identifying the determinants of under-five child undernutrition in Ethiopian administrative zones. BMC Medical Informatics and Decision Making. 2021;21. pmid:34689769
- 16. Mukuku O, Mutombo AM, Kamona LK, Lubala TK, Mawaw PM, Aloni MN, et al. Predictive model for the risk of severe acute malnutrition in children. Journal of Nutrition and Metabolism. 2019;2019. pmid:31354989
- 17. Hossain MI, Habib MJ, Saleheen AAS, Kamruzzaman M, Rahman A, Roy S, et al. Performance Evaluation of Machine Learning Algorithm for Classification of Unintended Pregnancy among Married Women in Bangladesh. Journal of Healthcare Engineering. 2022;2022. pmid:35669979
- 18. Li S, Liu T. Performance Prediction for Higher Education Students Using Deep Learning. Complexity. 2021;2021.
- 19. Zhao Y, Healy BC, Rotstein D, Guttmann CRG, Bakshi R, Weiner HL, et al. Exploration of machine learning techniques in predicting multiple sclerosis disease course. PLOS ONE. 2017;12: e0174866. pmid:28379999
- 20.
Organization WH. The world health report 2000: health systems: improving performance. 2000. https://books.google.com/books?hl=en&lr=&id=luqgKK2euxoC&oi=fnd&pg=PR7&dq=(WHO,+2000).&ots=sNpb8dbIV7&sig=MRP-PsWH1TGqqMy9JseIbLDA9XE
- 21. KM R, AL Y. Weight Gain During Pregnancy: Reexamining the Guidelines. 2009 [cited 2 Jul 2023]. pmid:20669500
- 22. Goldstein RF, Abell SK, Ranasinha S, Misso ML, Boyle JA, Harrison CL, et al. Gestational weight gain across continents and ethnicity: systematic review and meta-analysis of maternal and infant outcomes in more than one million women. BMC medicine. 2018;16. pmid:30165842
- 23. Thomson AM, Billewicz WZ. Clinical Significance of Weight Trends During Pregnancy. British Medical Journal. 1957;1: 243. pmid:13383239
- 24. Abrams B, Carmichael S, Selvin S. Factors associated with the pattern of maternal weight gain during pregnancy. Obstetrics and Gynecology. 1995;86: 170–176. pmid:7617345
- 25. Carmichael S, Abrams B, Selvin S. The pattern of maternal weight gain in women with good pregnancy outcomes. American journal of public health. 1997;87: 1984–1988. pmid:9431288
- 26. Mazumder T, Akter E, Rahman SM, Islam MT, Talukder MR. Prevalence and Risk Factors of Gestational Diabetes Mellitus in Bangladesh: Findings from Demographic Health Survey 2017–2018. International Journal of Environmental Research and Public Health. 2022;19. pmid:35270274
- 27. Kominiarek MA, Peaceman AM. Gestational Weight Gain. American journal of obstetrics and gynecology. 2017;217: 642. pmid:28549978
- 28.
Rasmussen KM, Yaktine AL, Guidelines I of M (US) and NRC (US) C to RIPW. Determining Optimal Weight Gain. 2009 [cited 10 Jan 2024]. https://www.ncbi.nlm.nih.gov/books/NBK32801/
- 29. Karim MR, Mamun ASM Al, Hossain MR, Islam MN, Rana MM, Wadood MA, et al. Nutritional status of tribal and non-tribal adults in rural Bangladesh: A comparative study. PLOS ONE. 2023;18: e0287625. pmid:37450509
- 30. Dutta M, Selvamani Y, Singh P, Prashad L. The double burden of malnutrition among adults in India: evidence from the National Family Health Survey-4 (2015–16). Epidemiology and health. 2019;41. pmid:31962037
- 31. Tanwi TS, Chakrabarty S, Hasanuzzaman S. Double burden of malnutrition among ever-married women in Bangladesh: A pooled analysis. BMC Women’s Health. 2019;19: 2–9. pmid:30704454
- 32. Hossain S, Khudri MM, Banik R. Regional education and wealth-related inequalities in malnutrition among women in Bangladesh. Public Health Nutrition. 2022;25: 1639–1657. pmid:34482847
- 33. Bhandari P, Gayawan E, Yadav S. Double burden of underweight and overweight among Indian adults: spatial patterns and social determinants. Public Health Nutrition.: 2808–2822. pmid:33875031
- 34. Blum AL, Langley P. Selection of relevant features and examples in machine learning. Artificial Intelligence. 1997;97: 245–271.
- 35. Rao JNK, Scott AJ. The analysis of categorical data from complex sample surveys: Chi-squared tests for goodness of fit and independence in two-way tables. Journal of the American Statistical Association. 1981;76: 221–230.
- 36. Thomas DR, Decady YJ. Testing for Association Using Multiple Response Survey Data: Approximate Procedures Based on the Rao-Scott Approach. International Journal of Testing. 2004;4: 43–59.
- 37. Chawla N V., Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic Minority Over-sampling Technique. Journal Of Artificial Intelligence Research. 2011;16: 321–357.
- 38. Cao XH, Stojkovic I, Obradovic Z. A robust data scaling algorithm to improve classification accuracies in biomedical data. BMC Bioinformatics. 2016;17: 1–10.
- 39. Zhang Y, Wang J, Luo X. Probabilistic wind power forecasting based on logarithmic transformation and boundary kernel. Energy Conversion and Management. 2015;96: 440–451.
- 40. Ahsan MM, Mahmud MAP, Saha PK, Gupta KD, Siddique Z. Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance. Technologies 2021, Vol 9, Page 52. 2021;9: 52.
- 41. Balabaeva K, Kovalchuk S. Comparison of Temporal and Non-Temporal Features Effect on Machine Learning Models Quality and Interpretability for Chronic Heart Failure Patients. Procedia Computer Science. 2019;156: 87–96.
- 42.
Ali J, Waseemullah, Khan MA, Khan NA. Machine Learning Approaches for Prediction of Nutrition Deficiency among Women of Different Age Groups. 3rd International Conference on Innovations in Computer Science and Software Engineering, ICONICS 2022. 2022.
- 43. Alamma BH. Data Analysis on the Risks of Obesity and Overweight in Women-A Study. 2021;7: 34–38.
- 44. Dunstan J, Aguirre M, Bastías M, Nau C, Glass TA, Tobar F. Predicting nationwide obesity from food sales using machine learning. Health Informatics Journal. 2020;26: 652–663. pmid:31106648
- 45. Ahmed Hemo S, Israt Rayhan M. Classification tree and random forest model to predict under-five malnutrition in Bangladesh. 2014.
- 46. Rahman A, Hossain Z, Kabir E, Rois R. An assessment of random forest technique using simulation study: illustration with infant mortality in Bangladesh. Health Information Science and Systems. 2022;10: 1–8.
- 47. Ahmadi P, Majd HA, Khodakarim S, Tapak L, Kariman N, Amini P, et al. Prediction of low birth weight using Random Forest: A comparison with Logistic Regression. Archives of Advances in Biosciences. 2017;8: 36–43.
- 48. Rahman SMJ, Ahmed NAMF, Abedin MM, Ahammed B, Ali M, Rahman MJ, et al. Investigate the risk factors of stunting, wasting, and underweight among under-five Bangladeshi children and its prediction based on machine learning approach. PLOS ONE. 2021;16: e0253172. pmid:34138925
- 49. Chilyabanyama ON, Chilengi R, Simuyandi M, Chisenga CC, Chirwa M, Hamusonde K, et al. Performance of Machine Learning Classifiers in Classifying Stunting among Under-Five Children in Zambia. Children. 2022;9. pmid:35884066
- 50. Methun MIH, Haq I, Uddin MSG, Rahman A, Islam S, Hossain MI, et al. Socioeconomic correlates of Adequate Maternal Care in Bangladesh: Analysis of the Bangladesh Demographic and Health Survey 2017–18. BioMed Research International. 2022;2022. pmid:36398069
- 51. Khanam R, Lee ASCC, Ram M, Quaiyum M, Begum N, Choudhury A, et al. Levels and correlates of nutritional status of women of childbearing age in rural Bangladesh. Public Health Nutrition. 2018;21: 3037–3047. pmid:30107861
- 52. Khanam SZ, Khanum H. Malnutrition and Associated Disorders AmongPregnant Women in Keranigonj, Bangladesh. Biomedical Journal of Scientific & Technical Research. 2021;36: 28715–28724.
- 53. Alkalash SH, Elnady RT, Khalil NA, Hegazy NN. Dietary practice and nutritional status among pregnant women attending antenatal care of egyptian, rural family health unit. Egyptian Journal of Hospital Medicine. 2021;83: 1030–1037.
- 54. Fite MB, Tura AK, Yadeta TA, Oljira L, Roba KT. Prevalence and determinants of dietary practices among pregnant women in eastern Ethiopia. BMC Nutrition. 2022;8: 1–10.
- 55. Mahanta LB, Roy TD, Dutta RG, Devi A. Nutritional status and the impact of socioeconomic factors on pregnant women in Kamrup district of Assam. Ecology of food and nutrition. 2012;51: 463–480. pmid:23082918
- 56. Biswas RK, Rahman N, Khanam R, Baqui AH, Ahmed S. Double burden of underweight and overweight among women of reproductive age in Bangladesh. Public Health Nutrition. 2019;22: 3163. pmid:31544733