Figures
Abstract
Metabolic syndrome (MetS) is a cluster of interconnected metabolic risk factors, including abdominal obesity, high blood pressure, and elevated fasting blood glucose levels, that result in an increased risk of heart disease and stroke. In this research, we aim to identify the risk factors that have an impact on MetS in the Bangladeshi population. Subsequently, we intend to construct predictive machine learning (ML) models and ultimately, assess the accuracy and reliability of these models. In this particular study, we utilized the ATP III criteria as the basis for evaluating various health parameters from a dataset comprising 8185 participants in Bangladesh. After employing multiple ML algorithms, we identified that 27.8% of the population exhibited a prevalence of MetS. The prevalence of MetS was higher among females, accounting for 58.3% of the cases, compared to males with a prevalence of 41.7%. Initially, we identified the crucial variables using Chi-Square and Random Forest techniques. Subsequently, the obtained optimal variables are employed to train various models including Decision Trees, Random Forests, Support Vector Machines, Extreme Gradient Boosting, K-nearest neighbors, and Logistic Regression. Particularly we employed the ATP III criteria, which utilizes the Waist-to-Height Ratio (WHtR) as an anthropometric index for diagnosing abdominal obesity. Our analysis indicated that Age, SBP, WHtR, FBG, WC, DBP, marital status, HC, TGs, and smoking emerged as the most significant factors when using Chi-Square and Random Forest analyses. However, further investigation is necessary to evaluate its precision as a classification tool and to improve the accuracy of all classifiers for MetS prediction.
Citation: Hossain MF, Hossain S, Akter MN, Nahar A, Liu B, Faruque MO (2024) Metabolic syndrome predictive modelling in Bangladesh applying machine learning approach. PLoS ONE 19(9): e0309869. https://doi.org/10.1371/journal.pone.0309869
Editor: Aleksandra Klisic, University of Montenegro-Faculty of Medicine, MONTENEGRO
Received: February 15, 2024; Accepted: August 12, 2024; Published: September 5, 2024
This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
Data Availability: Data set is publicly available in https://datadryad.org/stash/dataset/doi:10.5061/dryad.zkh18937f.
Funding: The author(s) received no specific funding for this work.
Competing interests: There is no conflict of interest among the authors.
Introduction
Metabolic syndrome (MetS) is a cluster of conditions that, when combined, increase the vulnerability of a patient to several life-threatening diseases such as coronary heart disease, diabetes, stroke, and various other consequential health issues [1]. Approximately 422 million individuals globally suffer from diabetes, with the majority residing in low and middle-income nations [2]. The disease is directly responsible for 1.5 million fatalities annually. Over the past few decades, there has been a steady rise in both the number of cases and the incidence of diabetes. Metabolic syndrome, also known as insulin resistance syndrome and syndrome X, encompasses a range of risk factors related to blood pressure, glucose, and plasma lipid levels. Hypertension was identified by a systolic blood pressure of at least 130 mm Hg, a diastolic blood pressure of at least 85 mm Hg, or the use of antihypertensive medications. The risk of sudden cardiac death was found to be 70% higher in those with the MetS. According to the United States National Heart Lung and Blood Institute, every 1 in 3 Americans is suffering from MetS [3]. A 2018 non-communicable disease risk factor evaluation revealed that 15.5% of Bangladeshis aged 40–69 are at risk for cardiovascular diseases [4]. According to a systematic review and meta-analysis [5], 37.0% of Bangladeshi people were found to have MetS. The investigation of potential risk factors is necessary due to the high prevalence of MetS, which is a serious public health concern. According to the National Institute of Health, with advancing years comes an increased risk of MetS [6]. The risk of MetS may increase due to lifestyle choices, being inactive, poor diet, lack of quality sleep, smoking, excessive alcohol consumption, poor socioeconomic standing, and working irregular shifts. The chances of MetS in adults with a sleeping duration of less than six hours per day were roughly five times higher than the odds of MetS in adults who slept often and for more hours per day [OR: 4.62; 95% CI: (1.02, 20.98)] [7]. Genetic and family history and obesity also worsen the condition as these conditions can reduce the “good” HDL cholesterol and increase the “bad” LDL cholesterol while impacting badly on blood triglycerides, and blood pressure [8]. Additionally, certain medical conditions such as polycystic ovary syndrome (PCOS) and insulin resistance can also increase the risk of developing MetS [9]. Pregnancy-related overweight and obesity can increase the child’s chance of developing MetS [10]. The impacts of socioeconomic conditions on health are demonstrated in a study where they showed that rural populations have significantly higher tobacco use (45.2%), inadequate fruit/vegetable intake (92.1%), and higher daily salt intake (9.0 g) compared to urban populations [11]. Socioeconomic condition, age, sex, obesity, hypertension, wealth, and living conditions all had impacts on the prevalence of diabetes according to the Bangladesh Demographic and Health Survey 2017–18 [12]. Of those with diabetes, 61.5% were not reported of the condition, 35.2% were receiving regular treatment, and 30.4% had it under control [13]. Predictive healthcare strategies, utilizing machine learning (ML), are crucial in predicting and mitigating the impact of metabolic diseases, particularly in countries like Bangladesh with rising prevalence of diabetes, obesity, and cardiovascular diseases. ML significantly enhances data-driven research efficiency, reducing manual inspection burden and enabling the development of new models for optimal operation. ML uses feature selection techniques which can be carried out via random forests, chi-square tests, and correlation plots to identify key factors in large datasets with numerous less significant variables. Our study aims to identify the potential risk factors associated with metabolic diseases and to generate a predictive model of such diseases using anML approach. ML is preferred for accurate prediction due to its ability to recognize patterns and relationships, analyze larger data sets, and generate predictions quickly, making it more efficient than traditional methods, which are time-consuming and vulnerable to bias [14]. According to a nationwide cross-sectional survey carried out in 2018, 12.3% of adult Bangladeshis reported engaging in insufficient physical activity, with women (14.8%) and urban groups (14.1%) exhibiting higher frequencies. Additionally, the study found that the prevalence of overweight and obesity was 25.9% and that it was considerably higher in the groups of women (33.7%), urban (34.3%) and wealthiest (34.3%) [11].
In this research, we aim to identify the risk factors that have an impact on MetS. Subsequently, we intend to construct predictive ML models and ultimately, assess the accuracy and reliability of these predictive ML models.
Related work
In the last 30 years, there has been a significant rise in the global prevalence of MetS [15]. The International Diabetes Federation (IDF), the World Health Organization (WHO), the European Group for the Study of Insulin Resistance (EGIR), and the US National Cholesterol Education Program Adult Treatment Panel III (NCEP ATP III) have defined and published separate clinical criteria for MetS [16].
Mohammad Ziaul Islam Chowdhury et al. proposed an examination of the MetS prevalence in Bangladesh using meta-analysis [5]. The findings indicated that the prevalence of MetS in females (32%) is higher than in males (25%), although this difference is not statistically significant (p = 0.434). When the modified NCEP III criteria were utilized, the highest occurrence of metabolic syndrome was observed at 37%. Conversely, the prevalence decreased to its lowest level of 20% when the WHO criteria were applied. The studies indicated that geographical factors played a significant role in the variation observed [5].
Another method proposed by Suparno Datta et al. utilized ML to detect MetS at an early stage [15], which relies exclusively on non-invasive features such as height, weight, waist circumference (WC), triglycerides (TGs), blood sugar, and HDL levels. The ensemble learning approach demonstrates superior performance, with GBMs and RF closely trailing behind. The findings indicate that machine learning can effectively forecast MetS, eliminating the need for invasive biomarkers, and enhancing the convenience of early detection [15].
Guadalupe Obdulia Gutiérrez-Esparza et al. proposed an ML algorithm that predicts MetS in the Mexican population [17]. Random Forest was employed to prioritize health parameters. The key prognostic factors for MetS, based on their significance, included FPG, TGs, WHtR, HDL-C, and BMI. The data was analyzed using the Random Forest and chi-squared methods, which showed that WHtR had the highest values and was the most significant factor. Additionally, when evaluated with the C45 and JRip algorithms, WHtR demonstrated better performance in terms of balanced accuracy, sensitivity, and specificity. The RF model, which utilized ATP III variables, demonstrated the highest performance with a balanced accuracy of 0.875, closely trailed by JRip [17].
Shu-Jie-Xia et al. developed a diagnostic model for MetS that can be developed by incorporating symptoms into a physiochemical index [18]. Their selected cohort was compared to three traditional machine learning methods: Decision tree (DT), Support vector machine (SVM), and Random Forest (RF). Comparison among the three models indicated that the RF model exhibited superior performance, boasting the highest average accuracy (0.942 on average, with a 95% CI of [0.925, 0.958]) and sensitivity (0.993 on average, with a 95% CI of [0.990, 0.996]), when compared to SVM. The significance of the TGM indexes in predicting MetS was clearly stated in this study [18].
DarkoIvanovic et al. proposed an Artificial Neural Network (ANN) that can be utilized to predict the diagnosis of MetS by solely relying on non-invasive, cost-effective, and readily available diagnostic methods [16]. They included gender, age, body mass index (BMI), waist-to-height ratio, and systolic and diastolic blood pressure as the input vectors. The outcome of this study demonstrated that the implementation of ANN effectively predicts both positive and negative cases of MetS, thereby aiding in the early prevention of metabolic syndrome. The highest positive predictive value (PPV) was found to be 0.858, while the negative predictive value (NPV) was close to PPV at 0.832 [16].
Hui Zhang et al. also used ML in a retrospective cohort study to predict the probability of adults developing MetS within a 4-year period. Three ML techniques were selected, namely ANN, classification, regression tree, and SVM. All models, except for the classification and regression tree model in internal validation, had discrimination values greater than 0.7. In external validation, the Logistic regression model showed the highest discrimination. Furthermore, both external validation (0.780) and internal validation (0.788) demonstrated satisfactory calibration for the ANN model [19].
Mohammad Salim Hossain et al. examined the MetS among individuals with diabetes who reside in aBangladeshi coastal area [20]. It was discovered that approximately 47.00% of patients diagnosed with type 2 diabetes mellitus were afflicted with MetS. The prevalence rate of MetS was higher in females, with 58.60%, compared to males, who had a rate of 36.14%. Females showed higher rates of obesity and hypertriglyceridemia, along with lower levels of HDL. Also, the age group of 55–64 showed the highest occurrence of MetS [20].
Suresh Mehata et al. evaluated the occurrence and factors influencing MetS in Nepalese adults based on a study that represents the entire nation. The most common combination was low HDL-C, abdominal obesity, and high blood pressure, making up 8.18% of cases. Close behind was abdominal obesity, low HDL-C, and high triglyceride levels, accounting for 8% of cases. Only a small fraction, specifically less than two percent, of the participants exhibited all five components of the syndrome, while a significant portion, 19%, did not display any of the components. The prevalence of the syndrome consistently increased as the age group advanced, with adults between the ages of 45 and 69 having the highest prevalence, ranging from 28% to 30% [21].
Hayat Ali Shah et al. used deep neural networks to generate feature representations of metabolic pathways, which are then fed into random forests for pathway prediction. The DeepRF model accurately predicts both known and unknown metabolic pathways in organisms. It has been tested on a dataset of over 318,016 instances, showing high accuracy (>97%), recall (>95%), and precision (>99%). When compared to other methods, DeepRF consistently provides more reliable results [22].
Methodology
Materials
Data source.
The dataset in this research was taken from the survey “National STEPS Survey for Non-communicable Diseases [NCDs] Risk Factors in Bangladesh 2018” which was conducted by the National Institute of Preventive and Social Medicine (NIPSOM) under the World Health Organization (WHO) [11]. Dataset link (STATA): https://extranet.who.int/ncdsmicrodata/index.php/access_licensed/download/1763/5374,(CSV):https://extranet.who.int/ncdsmicrodata/index.php/access_licensed/download/1763/5375. The study involved 8185 respondents, with 3804 male (46.5%) and 4381 female (53.5%) participants aged between 18–69 years. The dataset encompasses various risk factors for metabolic diseases including lifestyle habits, clinical and anthropometric measurements, and biomedical evaluation. A national cross-sectional population-based survey utilized a multi-stage cluster sampling design to select households and eligible adult men and women (aged 18–69) for an interview and physical examination. The physical examination consisted of anthropometry, blood pressure measurement, blood glucose, cholesterol, and a urine sample for salt analysis. The WHO NCD STEPS instrument version 3.2 was utilized to carry out the survey. The questionnaire is comprised of three STEPS aimed at assessing the NCD risk factors. Each step encompassed a range of core, expanded, and country-specific questions that were adjusted to cater to the local requirements. In Bangladesh, all core modules and optional modules, namely oral health, and cervical cancer screening, were included. The questionnaire was translated into Bengali, and the validation of the translated questionnaire was conducted through back translation. In the initial phase, personal information from participants in STEP 1, which included documenting their height, weight, and hip and waist measurements was collected. These measurements were taken from individuals who agreed to move on to STEP 2. After completing data collection in STEP 1 and STEP 2 at selected households, the following day, biochemical assessments were carried out at specified locations for each Primary Sampling Unit (PSU). These assessments involved analyzing blood samples for glucose and total cholesterol levels, which were taken from venous blood samples. Plasma samples were also used to measure the concentrations of glucose, total cholesterol, and HDL cholesterol. Fasting blood samples were specifically collected to identify elevated blood glucose levels. The subjects were classified as having type 2 diabetes if they reported being informed by their doctor about the disease (provided the diagnosis was made after the age of 25 and not due to pregnancy), if they reported using insulin or a hypoglycemic medication, or if their fasting blood sugar level exceeded 100 mg/dL [23].
Habits and lifestyles.
Three STEPS were used in validated questionnaires [11] to measure the risk factors for NCDs. These questionnaires were used to collect data on lifestyle variables such as alcohol and smoking consumption, physical activity levels, and salt intake before meals.
Clinical and anthropometric measurements.
The measurements of the diastolic and systolic blood pressure were taken following the JNC-established standard protocol [24], WC, height, weight, and BMI were determined using the formula weight/height2. The WHtR was computed by dividing the WC by height (waist/height).
Biochemical evaluation.
The laboratory tests that were acquired were fasting blood glucose (FBG),TGs, and HDL cholesterol (HDL-C). Blood samples were obtained after a 12-hour overnight fast.
Diagnostic criteria.
MetS encompasses a combination of significant risk factors, lifestyle-related risk factors, and emerging risk factors. These factors include abdominal obesity, atherogenic dyslipidemia (elevated triglyceride levels, small LDL particles, low HDL cholesterol), high blood pressure, insulin resistance (with or without glucose intolerance), and prothrombotic and proinflammatory states [25]. The clinical diagnosis criteria for MetS based on the ATP III criteria [26] was used. It is shown in Table 1.
Methods
Decision tree.
A decision tree is a tree-structured classifier in which the features of a dataset are represented by internal nodes, the decision rules are represented by branches, and the conclusion is represented by each leaf node. It is a graphical tool that shows all the options for solving a problem or making a decision given certain parameters [27]. The method is non-parametric, effective for large datasets, and can be divided into training and validation datasets for optimal decision tree model construction [28].
Random forest.
Adele Cutler and Breiman [29] introduced Random Forest which is a prediction technique that generates a set of CART classification trees and assigns the class to the instance based on a majority vote. This approach outperforms individual classification trees in terms of prediction accuracy and can be used for a variety of prediction situations [30]. In cases of regression or classification, Random Forest offers a technique called Variable Importance Measures (VIMs) to rank the importance of variables.
Support vector machine (SVM).
A support vector machine (SVM) is an ML algorithm that uses supervised learning models to solve complex classification, regression, and outlier detection problems by performing optimal data transformations that determine boundaries between data points based on predefined classes, labels, or outputs. SVMs are widely adopted across disciplines such as healthcare, natural language processing, signal processing applications, and speech & image recognition fields [31].
Extreme gradient boosting (XGBoost).
An ensemble ML technique that uses decision trees to provide a gradient boosting framework is called Xtreme Gradient Boosting (XGBoost). To reach the final prediction, XGBoost creates new models that predict the residuals of the earlier models [32].
K-nearest neighbors (KNN).
The k-nearest neighbors technique transforms Big Data into Smart Data, free from noise, redundant information, and missing values [33]. This approach is crucial for accurate data mining and revealing insightful information.
Logistic regression.
A logistic regression model examines the relationship between one or more independent variables that are already present in order to predict a dependent variable in the data. Multiple input criteria can be considered by the model. Logistic regression is used in the field of ML as a key technique. The algorithms improve at classifying data sets as more pertinent data becomes available.
Feature selection criteria.
The process of feature selection holds great significance in the realm of ML model development it allows the identification of crucial variables from extensive datasets which have a significant influence on the model in comparison to other variables. In our study, we utilized chi-square and random forest methodologies to identify these important variables.
Statistical analysis.
To reveal the characteristics of objects under study we exert a chi-square test employing statistical packages for social science by using SPSS software version 28.0.We also utilized Python 3.0 with Jupyter Notebook where Pandas, NumPy, Scikit-learn, Seaborn, and Matplotlib libraries were used to reveal the results.
Metrices
In ML, various performance metrics are used to evaluate the accuracy and effectiveness of a classification model. Some of the key metrics include:
Precision: The proportion of true positive predictions among all positive predictions made by the model. It measures how accurate the model’s positive predictions are [34].
Sensitivity: Also known as recall, it is the proportion of true positive predictions among all actual positive cases. It measures how well the model identifies all the positive cases [34].
Specificity: It is the proportion of true negative predictions among all actual negative cases. Specificity measures how well the model identifies all the negative cases [34]salma.
Balanced accuracy (AUC-ROC): The area under the receiver operating characteristic (ROC) curve, which measures the model’s ability to distinguish between positive and negative cases. A higher AUC-ROC value indicates a better model performance [35].
where P = Positive, N = Negative, TP = True Positive, FN = False Negative, TN = True Negative and FP = False Positive, respectively.
F1 score: The harmonic means of precision and recall; it measures the model’s overall accuracy by balancing both precision and recall. A higher F1 score indicates better model performance [36].
Results and analysis
In our research, we utilized the dataset obtained from the "National STEPS Survey for Non-communicable Diseases Risk Factors in Bangladesh 2018" [11]. The ATP III criteria were employed to identify the crucial cardiovascular risk factors, and subsequently, the participants were categorized into two groups: MetS Group and Normal Group. The variables in our study can be classified into four categories: Lifestyle variables, Anthropometric variables, Clinical variables, and Biochemical variables [26, 37]. Lifestyle variables encompass factors such as smoking, alcohol consumption, and marital status. Anthropometric variables include age, weight, height, BMI, WHtR, waist circumference (WC), and hip circumference (HC). Clinical variables consist of systolic and diastolic blood pressure measurements. Lastly, biochemical variables encompass fasting blood glucose (FBG), high-density lipoprotein (HDL), and triglycerides (TGs).
In Fig 1, the process of constructing our proposed model is depicted. Initially, the crucial variables are identified through the utilization of Chi-Square and Random Forest techniques [17]. Subsequently, the obtained optimal variables are employed to train various models including Decision Trees, Random Forests, Support Vector Machines, Extreme Gradient Boosting, K-nearest neighbors, and Logistic Regression. The validity of these models is then assessed, and their performance is compared based on metrics such as Precision, Sensitivity, Specificity, Balanced Accuracy, AUC score, Recall, and F1 score. Furthermore, the performance of these models is visualized through the illustration of the ROC curve.
Prevalence of MetS
In our study, we found that the prevalence of MetS was 27.8% among the 8185 participants. Notably, there were significant variations between males and females, with males accounting for 41.7% of the MetS group and females comprising 58.3%. Fig 2 further highlights the dominance of the female population in both the MetS group and the Normal group.
Table 2 displays the overall attributes of the participants in relation to the Mets group and Normal group. The Chi-Square test was employed in SPSS version 28.0 to analyze the data.
Variable importance and key risk factors of metabolic syndrome
In the initial stage of our analysis, we classified our continuous variables, including WHtR, SBP, DBP, WC, HC, FBG, and TGs, into two categories: Yes and No. The classification of continuous variables is crucial for capturing non-linear relationships in models. Categorizing variables into ranges helps the model capture complex patterns effectively. This process optimizes algorithm performance by simplifying data representation, making it easier for the algorithm to learn and predict. Standardizing input data format and representing all variable types appropriately is essential for dealing with heterogeneous data. Categorizing continuous variables also improves the interpretability of the model’s output, making predictions easier to understand for users [38]. This categorization was based on the ATP III criteria from the original dataset [26]. We made this categorization for our convenience in applying various classification algorithms.
Upon examining Table 3 and Fig 3, we observed that Age, WHtR, SBP, DBP, WC, HC, FBG, Marital Status, TGs, and Residence were identified as key risk factors for Metabolic Syndrome (MetS) through the application of Chi-Square analysis. Among these variables, age had the highest score, followed by WHtR. SBP ranked third in terms of significance. DBP, WC, HC, and FBG held a moderate level of importance. Marital Status, TGs, and Residence were found to have the lowest significance.
After ranking these variables, it became evident that age played a crucial role in the development of Metabolic Syndrome. We observed that individuals above the age of 40 were more susceptible to Metabolic Syndrome diseases. Additionally, WHtR emerged as another prominent risk factor influencing MetS.
In Fig 4, the application of Random Forest on our training dataset reveals that Age, SBP, WHtR, FBG, WC, DBP, marital status, HC, TGs, and smoking are identified as the key risk factors for Metabolic Syndrome (MetS). The Age variable demonstrates the highest score, as confirmed by Chi Square analysis, followed by SBP. WHtR ranks third in importance. FBG, WC, DBP, and marital status hold a medium position in terms of significance. On the other hand, HC, TGs, and smoking are considered the least influential factors. The ranking of these factors, similar to Chi-Square analysis, highlights Age as the most crucial determinant for Metabolic Syndrome, followed by Systolic Blood Pressure. It has been established that individuals with elevated systolic blood pressure face a greater risk of developing Metabolic Syndrome.
Model performance and comparison
The predominant key risk factors of Metabolic Syndrome have been identified as Age, SBP, WHtR, FBG, WC, DBP, HC, and TGs through the Random Forest Variable Importance Measures (VIMs) technique. In order to further investigate these variables, we employed six different ML algorithms, namely Decision Tree, Random Forest, Support Vector Machine, Extreme Gradient Boosting, K-Nearest Neighbors, and Logistic Regression. These algorithms were chosen to explore the potential relationship between the aforementioned eight variables and Metabolic Syndrome.
We evaluate our models through both cross validation and external validation, and we find that our models demonstrate excellent performance. We optimize the parameters for each model through hyperparameter tuning and incorporate them into the model. We mention these parameters in the Table 4.
Table 4 illustrates that the Support Vector Machine (SVM) achieves the highest precision compared to other algorithms, with an accuracy rate of 78%. The SVM accurately identifies 78% of the relevant items. On the other hand, Logistic regression and XGBoost exhibit the highest balanced accuracy (71%) and sensitivity score (63%) among the other algorithms. However, there is only a 63% probability that the model will correctly detect positive cases of Metabolic Syndrome, which is a significantly lower score. In terms of specificity score, XGBoost and KNN outperform other algorithms, with a rate of 79% probability that these models will correctly reject negative cases. Logistic Regression demonstrates the best results in the AUC score, indicating that 75% of items are correctly classified by this algorithm. In terms of Recall, all classifiers exhibit similarly good performance, capturing 75% to 77% of positive items. The F1 score reveals that all classifiers perform moderately well. Overall, considering all metrics, Logistic Regression emerges as the best classifier among the other algorithms.
Table 5 displays the results of the six models with excluded variables such as BMI, HDL, sex, smoking, alcohol, and using extra salt. The presence of these variables has been found to diminish the overall performance of the models. Consequently, we have opted to remove these variables from our analysis due to their lack of significance in relation to our results.
Fig 5 illustrates the identical concept that was previously discussed. Logistic Regression stands out as the most effective classification algorithm when compared to other algorithms in the field. Based on the findings presented in Fig 5, XGBoost can be regarded as the poorest performing classification algorithm, exhibiting the lowest precision in comparison to other algorithms in the same category.
Fig 6 illustrates the ROC curve, which depicts the performance of different classification algorithms. According to the results, Logistic Regression emerges as the most effective classification algorithm, while SVM is identified as the least accurate classification algorithm.
Discussion and recommendation
Our research represents a pioneering effort in predicting potential risk factors of MetS using MLTs. The study encompassed a total of 8185 participants from various regions of Bangladesh and incorporated 16 distinct variables. These variables encompassed anthropometric data, lifestyle-related features gathered through questionnaires, and biochemical test results which were collected from “National STEPS Survey for Non-communicable Diseases Risk Factors in Bangladesh 2018”. Upon conducting the required computations, our findings revealed that 27.8% of the population exhibited a prevalence of MetS [39, 40].
Notably, there were remarkable disparities between sexes, with males comprising 41.7% of the MetS group and females accounting for 58.3% [41, 42]. The higher prevalence of MetS among females can be attributed to various factors, including sociocultural activities, psychosocial behaviors, socioeconomic status, genetic inheritance, and hormonal changes. These factors make females more susceptible to developing MetS compared to males [43].
In this research, a collection of health parameters was prioritized using Chi-Square and then compared to Random Forest to determine the significance of each variable. The findings revealed that the primary predominant variables for MetS in our sample of the Bangladeshi population based on their importance were Age, WHtR, SBP, DBP, WC, FBG, HC, and TGs. Interestingly, six out of these eight variables align with the criteria proposed by ATP III for classifying individuals with MetS [44].
It is worth noting that WHtR was ranked as the second variable in terms of significance, which is a noteworthy discovery, particularly concerning the obesity crisis in our nation and its association with cardiovascular diseases, the leading cause of illness and death globally and in Bangladesh. Abdominal obesity has emerged as an indicator of cardiometabolic risk, prompting considerable endeavors to identify a suitable anthropometric measurement that accurately reflects the accumulation of fat in the abdominal region and can be conveniently obtained without the need for advanced technological equipment [17, 45].
It is also worth noting that anthropometric indexes are significantly impacted by various factors such as age, sex, and ethnicity. Consequently, selecting a suitable index can be a daunting endeavor. When utilizing Chi-Square and Random Forest in our research, age emerges as the most crucial factor. Individuals who are older than 40 years are found to be in the high-risk category for MetS. While BMI has been widely utilized as a measure of body fat content, in our research it fails to accurately reflect abdominal obesity [46, 47].
A simple and effective method for assessing abdominal obesity and associated metabolic risk is WC measurement. It is worth noting that in our research WC is the fifth significant feature as a risk factor for Mets and abdominal obesity significantly influences the emergence of MetS.
The Random Forest Variable Important Measures (VIMs) technique in Fig 4 revealed that BMI, HDL, Marital Status, Residence, Sex, Using Extra Salt, and Smoking had lower scores and were therefore not considered in the modeling. The inclusion of these variables resulted in a decrease in the overall accuracy of the models. Conversely, excluding these variables led to an increase in the overall accuracy of the models [26].
Briefly, our research has revealed a moderately higher prevalence of Metabolic Syndrome (MetS) among the Bangladeshi Population, particularly among females. The primary risk factors identified were Age, Waist and Height Ratio (WHtR), High Systolic and Diastolic Blood Pressure, Waist Circumference (WC), Fasting Blood Glucose (FBG), Hip Circumference (HC), and Triglycerides (TGs), which aligns with the conventional definition of MetS established by various organizations.
Machine learning models have demonstrated potential in clinical settings for early detection and intervention of metabolic syndrome. These models utilize non-invasive factors, making them a cost-effective option for large-scale screening. By taking into account variables such as gender, Age, WHtR, SBP, DBP, WC, FBG, HC, and TGs, these models can pinpoint individuals at high risk. Additionally, they can analyze health records, lifestyle factors, and medical history to provide personalized risk assessments. With a focus on accuracy, these models allow for timely intervention and customized treatment plans for individuals at risk of metabolic syndrome [48, 49]. Future investigation is necessary to further explore the integration of machine learning models into healthcare systems in Bangladesh for early detection and intervention strategies.
Recommendations
It is recommended that expanding on the importance of giving higher priority to older individuals in preventing Metabolic Syndrome (MetS), it is essential to recognize that this population is more susceptible to developing MetS due to age-related physiological changes and lifestyle factors. By prioritizing older individuals, healthcare centers can focus on providing targeted interventions and preventive measures to reduce the risk of MetS. Furthermore, it is crucial to educate females about the risk factors associated with MetS, as they are also more prone to developing this condition. Women often experience hormonal changes throughout their lives, such as during pregnancy and menopause, which can contribute to the development of MetS. By raising awareness among females, healthcare centers can empower them to make informed decisions regarding their health and take necessary steps to prevent MetS [50].
Overall, prioritizing older individuals and providing targeted education on MetS risk factors, with a special emphasis on WHtR, can significantly contribute to the prevention and management of this condition. By promoting awareness and encouraging individuals to be conscious of their weight, WC, and HC, healthcare centers can empower individuals to take control of their health and reduce the burden of MetS in the population [50, 51].
Conclusion
To summarize, our research successfully estimated the prevalence of MetS in Bangladesh, which was found to be 27.8%. Notably, the female population showed a higher prevalence of MetS. Through the implementation of Random Forest and Chi-Square methods, we identified several potential risk factors for MetS, including increased Age, WHtR, SBP, DBP, WC, FBG, HC, and TGs.These models can be utilized to assess the influencing factors of various diseases and contribute to treatment strategies and decision-making processes.
Furthermore, it is crucial to recognize MetSasa global epidemic. In order to limit its further spread and reduce associated morbidity and mortality, it is imperative to implement regional measures that prioritize primary prevention.
Limitations and strength
Exclusion of potentially relevant variables.
Certain variables were excluded from the analysis due to their insignificant results. Moreover, the performance of our model is not particularly high. Additionally, the significance of BMI, a crucial factor for abdominal obesity, was not observed in our research. Lastly, HDL, an important factor for cardiovascular disease, also exhibited insignificance in our study [52].
Strength
Strengths of our study.
Despite these limitations, the primary strength of this study lies in the extensive size of the dataset, which was obtained from the "National STEPS Survey for Non-communicable Diseases Risk Factors in Bangladesh 2018" survey. We employed six significant classification algorithms to predict our results. Notably, our study is the pioneering endeavor of its kind to forecast the prevalence and potential risk factors of MetS in Bangladesh.
Acknowledgments
We would like to express our sincere appreciation for obtaining the dataset from the "National STEPS Survey for Non-communicable Diseases Risk Factors in Bangladesh 2018".
References
- 1. Samson SL, Garber AJ. Metabolic Syndrome. Endocrinol Metab Clin North Am. 2014 Mar;43(1):1–23. pmid:24582089
- 2.
World Health Organizations [Internet]. 2023. Diabetes.
- 3.
National Heart, Lung, and Blood Institute [Internet]. 2022. METABOLIC SYNDROME.
- 4. Wasim Bin Habib, Mohammad Al-Masum Molla. Cardiovascular diseases: A rising concern for young people. The Daily Star. 2022 Sep 29;
- 5. Chowdhury MZI, Anik AM, Farhana Z, Bristi PD, Abu Al Mamun BM, Uddin MJ, et al. Prevalence of metabolic syndrome in Bangladesh: A systematic review and meta-analysis of the studies. BMC Public Health. 2018 Mar 2;18(1). pmid:29499672
- 6. Naghipour M, Joukar F, Nikbakht HA, Hassanipour S, Asgharnezhad M, Arab-Zozani M, et al. High prevalence of metabolic syndrome and its related demographic factors in north of Iran: Results from the PERSIAN guilan cohort study. Int J Endocrinol. 2021;2021. pmid:33859688
- 7. Belayneh M, Mekonnen TC, Tadesse SE, Amsalu ET, Tadese F. Sleeping duration, physical activity, alcohol drinking and other risk factors as potential attributes of metabolic syndrome in adults in Ethiopia: A hospital-based cross-sectional study. PLoS One. 2022 Aug 29;17(8):e0271962. pmid:36037175
- 8.
Centers for Disease Control and Prevention [Internet]. 2023. Know Your Risk for High Cholesterol.
- 9. Seeber B, Morandell E, Lunger F, Wildt L, Dieplinger H. Afamin serum concentrations are associated with insulin resistance and metabolic syndrome in polycystic ovary syndrome. Reproductive Biology and Endocrinology. 2014 Dec 10;12(1):88. pmid:25208973
- 10. Stubert J, Reister F, Hartmann S, Janni W. The Risks Associated With Obesity in Pregnancy. DtschArztebl Int. 2018 Apr 20; pmid:29739495
- 11. Riaz BK, Islam MZ, Islam ANMS, Zaman MM, Hossain MA, Rahman MM, et al. Risk factors for non-communicable diseases in Bangladesh: Findings of the population-based cross-sectional national survey 2018. BMJ Open. 2020 Nov 27;10(11). pmid:33247026
- 12.
Indicators K. Bangladesh Demographic and Health Survey 2017–18 [Internet]. 2019. Available from: http://www.niport.gov.bd;
- 13. Hossain MB, Khan MdN, Oldroyd JC, Rana J, Magliago DJ, Chowdhury EK, et al. Prevalence of, and risk factors for, diabetes and prediabetes in Bangladesh: Evidence from the national survey using a multilevel Poisson regression model with a robust variance. PLOS Global Public Health. 2022 Jun 1;2(6):e0000461. pmid:36962350
- 14. Janiesch C, Zschech P, Heinrich K. Machine learning and deep learning. Available from: https://doi.org/10.1007/s12525-021-00475-2
- 15. Datta S, Schraplau A, Da Cruz HF, Sachs JP, Mayer F, Bottinger E. A machine learning approach for non-invasive diagnosis of metabolic syndrome. In: Proceedings—2019 IEEE 19th International Conference on Bioinformatics and Bioengineering, BIBE 2019. Institute of Electrical and Electronics Engineers Inc.; 2019. p. 933–40.
- 16. Ivanović D, Kupusinac A, Stokić E, Doroslovački R, Ivetić D. ANN Prediction of Metabolic Syndrome: a Complex Puzzle that will be Completed. J Med Syst. 2016 Dec 1;40(12).
- 17. Gutiérrez-Esparza GO, Vázquez OI, Vallejo M, Hernández-Torruco J. Prediction of metabolic syndrome in a Mexican population applying machine learning algorithms. Symmetry (Basel). 2020 Apr 1;12(4).
- 18. Xia SJ, Gao BZ, Wang SH, Guttery DS, Li CD, Zhang YD. Modeling of diagnosis for metabolic syndrome by integrating symptoms into physiochemical indexes. Biomedicine and Pharmacotherapy. 2021 May 1;137. pmid:33588265
- 19. Zhang H, Chen D, Shao J, Zou P, Cui N, Tang L, et al. Machine learning-based prediction for 4-year risk of metabolic syndrome in adults: A retrospective cohort study. Risk ManagHealthc Policy. 2021;14:4361–8. pmid:34707419
- 20. Salim Hossain M, ZahedurRahaman M, Banik S, Shahid Sarwar M, Yokota K. PREVALENCE OF THE METABOLIC SYNDROME IN DIABETIC PATIENTS LIVING IN A COASTAL REGION OF BANGLADESH. IJPSR [Internet]. 2012;3(8):8. Available from: www.ijpsr.com
- 21. Mehata S, Shrestha N, Mehta RK, Bista B, Pandey AR, Mishra SR. Prevalence of the Metabolic Syndrome and its determinants among Nepalese adults: Findings from a nationally representative cross-sectional study. Sci Rep. 2018 Dec 1;8(1). pmid:30301902
- 22. Shah HA, Liu J, Yang Z, Zhang X, Feng J. DeepRF: A deep learning method for predicting metabolic pathways in organisms based on annotated genomes. Comput Biol Med. 2022 Aug;147:105756. pmid:35759992
- 23. Janssen I, Katzmarzyk PT, Ross R. Body Mass Index, Waist Circumference, and Health Risk Evidence in Support of Current National Institutes of Health Guidelines [Internet]. Available from: http://archinte.jamanetwork.com/ pmid:12374515
- 24. Chobanian Aram V., Bakris George L., Black Henry R., Cushman William C., Green Lee A., IzzoJr Joseph L., et al. Seventh Report of the Joint National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood Pressure. AHA/ASA Journals. 2003 Dec 1;42(6).
- 25.
Executive Summary of the Third Report of the National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (Adult Treatment Panel III) Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults T HE THIRD REPORT OF THE EX-pert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (Adult Treatment Panel III, or ATP III) constitutes the National [Internet]. Available from: http://jama.jamanetwork.com/
- 26. Grundy SM, Brewer HB, Cleeman JI, Smith SC, Lenfant C. Definition of Metabolic Syndrome: Report of the National Heart, Lung, and Blood Institute/American Heart Association Conference on Scientific Issues Related to Definition. In: Circulation. 2004. p. 433–8. pmid:14744958
- 27.
Java Point [Internet]. Decision Tree Classification Algorithm.
- 28. Song YY, Lu Y. Decision tree methods: applications for classification and prediction. Shanghai Arch Psychiatry. 2015 Apr 1;27(2):130–5. pmid:26120265
- 29. Breiman L. Random Forests. Vol. 45. 2001.
- 30. Strobl C, Boulesteix AL, Zeileis A, Hothorn T. Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics. 2007;8. pmid:17254353
- 31. Vijay Kanade. What Is a Support Vector Machine? Working, Types, and Examples. 2002.
- 32. Ogunleye AA, Qing-Guo W. XGBoost Model for Chronic Kidney Disease Diagnosis.
- 33. Triguero I, García-Gil D, Maillo J, Luengo J, García S, Herrera F. Transforming big data into smart data: An insight on the use of the k-nearest neighbors algorithm to obtain quality data. Vol. 9, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. Wiley-Blackwell; 2019.
- 34.
Salma Ghoneim. Medium. 2019. Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on?
- 35. Erickson BJ, Kitamura F. Magician’s corner: 9. performance metrics for machine learning models. Vol. 3, Radiology: Artificial Intelligence. Radiological Society of North America Inc.; 2021. pmid:34136815
- 36.
Namrata Kapoor. Numpy Ninja. 2021. Recall, Specificity, Precision, F1 Scores and Accuracy.
- 37. Moy FM, Bulgiba A. The modified NCEP ATP III criteria maybe better than the IDF criteria in diagnosing Metabolic Syndrome among Malays in Kuala Lumpur. BMC Public Health. 2010 Dec 6;10(1):678.
- 38.
Riccardo Di Sipio. Medium. 2021. The Definitive Way to Deal With Continuous Variables in Machine Learning.
- 39. Kim J, Mun S, Lee S, Jeong K, Baek Y. Prediction of metabolic and pre-metabolic syndromes using machine learning models with anthropometric, lifestyle, and biochemical factors from a middle-aged population in Korea. BMC Public Health. 2022 Dec 1;22(1). pmid:35387629
- 40. Marbou WJT, Kuete V. Prevalence of metabolic syndrome and its components in Bamboutos division’s adults, West Region of Cameroon. Biomed Res Int. 2019;2019. pmid:31183378
- 41. Lin CS, Lee WJ, Lin SY, Lin HP, Chen RC, Lin CH, et al. Subtypes of Premorbid Metabolic Syndrome and Associated Clinical Outcomes in Older Adults. Front Med (Lausanne). 2022 Feb 11;8. pmid:35223876
- 42. Jesmin S, Islam S, Akter S, Islam M, Nusrat Sultana S, Yamaguchi N, et al. Metabolic syndrome among pre-and post-menopausal rural women in Bangladesh: result from a population-based study [Internet]. 2013. Available from: http://www.biomedcentral.com/1756-0500/6/157 pmid:23597398
- 43. Liang X, Or B, Fung Tsoi M, Lung Cheung C, Cheung BM, Bernard Cheung CM. Prevalence of Metabolic Syndrome in the United States National Health and Nutrition Examination Survey (NHANES) 2011–2018. Available from: https://doi.org/10.1101/2021.04.21.21255850
- 44. Worachartcheewan A, Nantasenamat C, Isarankura-Na-Ayudhya C, Pidetcha P, Prachayasittikul V. Identification of metabolic syndrome using decision tree analysis. Diabetes Res Clin Pract. 2010 Oct;90(1):e15–8. pmid:20619912
- 45. Chen MS, Chiu CH, Chen SH. Risk assessment of metabolic syndrome prevalence involving sedentary occupations and socioeconomic status. BMJ Open. 2021 Dec 13;11(12):e042802. pmid:34903529
- 46. Tamrakar R, Yang X, Pradhan S, Su X, Luo Z, Li L, et al. Machine learning methods for the prediction of prevalence and potential risk factors of Metabolic Syndrome in Guangxi, China [Internet]. Available from: https://ssrn.com/abstract=4341038
- 47. Binh TQ, Phuong PT, Nhung BT, Tung DD. Metabolic syndrome among a middle-aged population in the red river delta region of Vietnam. BMC EndocrDisord. 2014 Sep 26;14(1). pmid:25261978
- 48. Xu W, Zhang Z, Hu K, Fang P, Li R, Kong D, et al. Identifying metabolic syndrome easily and cost effectively using non-invasive methods with machine learning models. Diabetes, Metabolic Syndrome and Obesity. 2023;16:2141–51. pmid:37484515
- 49. Shin H, Shim S, Oh S. Machine learning-based predictive model for prevention of metabolic syndrome. PLoS One. 2023 Jun 1;18(6 June). pmid:37267302
- 50. Lin YH, Chu LL, Kao CC, Chen TB, Lee I, Li HC. The Effects of a Diet and Exercise Program for Older Adults With Metabolic Syndrome. Journal of Nursing Research. 2015 Sep;23(3):197–205. pmid:25741965
- 51. Ashwell M, Gunn P, Gibson S. Waist-to-height ratio is a better screening tool than waist circumference and BMI for adult cardiometabolic risk factors: Systematic review and meta-analysis. Vol. 13, Obesity Reviews. 2012. p. 275–86. pmid:22106927
- 52. Wu J, Zhou X, Ren J, Zhang Z, Ju H, Diao X, et al. Glycosyltransferase-related prognostic and diagnostic biomarkers of uterine corpus endometrial carcinoma. Comput Biol Med. 2023 Sep;163:107164. pmid:37329616