Figures
Abstract
Diabetes, a chronic metabolic condition characterised by persistently high blood sugar levels, necessitates early detection to mitigate its risks. Inadequate dietary choices can contribute to various health complications, emphasising the importance of personalised nutrition interventions. However, real-time selection of diets tailored to individual nutritional needs is challenging because of the intricate nature of foods and the abundance of dietary sources. Because diabetes is a chronic condition, patients with this illness must choose a healthy diet. Patients with diabetes frequently need to visit their doctor and rely on expensive medications to manage their condition. It is challenging to purchase medication for chronic illnesses on a regular basis in underdeveloped nations. Motivated by this concept, we suggest a hybrid model that, rather than depending solely on medication to evade a visit to the doctor, can first anticipate diabetes and then suggest a diet and exercise regimen. This research proposes an optimized approach by harnessing machine learning classifiers, including Random Forest, Support Vector Machine, and XGBoost, to develop a robust framework for accurate diabetes prediction. The study addresses the difficulties in predicting diabetes precisely from limited labeled data and outliers in diabetes datasets. Furthermore, a thorough food and exercise recommender system is unveiled, offering individualized and health-conscious nutrition recommendations based on user preferences and medical information. Leveraging efficient learning and inference techniques, the study achieves a meager error rate of less than 30% using an extensive dataset comprising over 100 million user-rated foods. This research underscores the significance of integrating machine learning classifiers with personalized nutritional recommendations to enhance diabetes prediction and management. The proposed framework has substantial potential to facilitate early detection, provide tailored dietary guidance, and alleviate the economic burden associated with diabetes-related healthcare expenses.
Citation: Sajid M, Malik KR, Khan AH, Iqbal S, Alaulamie AA, Ilyas QM (2025) Next-generation diabetes diagnosis and personalized diet-activity management: A hybrid ensemble paradigm. PLoS ONE 20(1): e0307718. https://doi.org/10.1371/journal.pone.0307718
Editor: M. Shamim Kaiser, Jahangirnagar University, BANGLADESH
Received: February 27, 2024; Accepted: July 10, 2024; Published: January 8, 2025
Copyright: © 2025 Sajid et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: PIMA and Food datasets are publicly available and can be accessed using the following links: https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database and https://fdc.nal.usda.gov/download-datasets.html.
Funding: This work was supported by the Deanship of Scientific Research, Vice Presidency for Graduate Studies and Scientific Research, King Faisal University, Saudi Arabia (Grant No. KFU241647 to SI).
Competing interests: NO authors have competing interests.
Introduction
One somewhat prevalent metabolic disorder is diabetes. Type 2 diabetes commonly first appears in middle age, though it can strike at any age. However, there are also reports of similar conditions in children. Several factors, including sedentary lifestyles, body weight, genetic predisposition, and eating habits bring on diabetes. Diabetes that is not treated can lead to hyperglycemia, which is defined by unusually high blood sugar levels. For patients to live longer and have a higher quality of life, early diabetes detection is crucial [1]. Among the main modifiable risk factors for preventing type 2 diabetes include being overweight or obese, not exercising, and eating a bad diet. The majority of countries state that their national policies aim for roughly 89% physical activity and a healthy diet. However, when money and implementation are taken into account, things change [2].
The pancreas, one of the body’s most important organs, influences how sugar, protein, and fat are used for daily energy. When insulin levels are low or absent, blood glucose, or blood sugar, concentrations will be high. Excess sugar in the urine will be expelled; this medical condition is called diabetic mellitus [3]. Diabetes was a factor in 1.5 million fatalities in 2012. Over 2.2 million deaths were attributed to heart diseases and other reasons, most likely brought on by blood glucose levels that were not in the optimal range. Patients with prediabetes must live healthy lives and receive symptomatic therapy and early identification [4]. Determining a person’s susceptibility to and predisposition toward a chronic illness such as diabetes is important. Early identification reduces the risk of more serious health conditions developing and lowers the cost of treating chronic disorders. To help physicians make better decisions about patient treatment in high-risk situations, precise deductions from quantifiable medical indicators are essential, particularly in emergencies where a patient may be unconscious or unable to communicate [5].
Certain foods are consumed in specific amounts as part of a healthy diet to meet nutritional needs [6]. The primary causes of health problems are inadequate food and an unrestricted habitual diet [7]. To obtain these nutrients, the optimum diet is well-balanced. However, the average person is still ignorant of the most common reasons for either an excess or a deficiency of minerals, such as calcium, proteins, and vitamins [8]. According to a World Health Organization (WHO) study, 347 million people worldwide are estimated to have diabetes. Except for breastfeeding, no diet gives the body all the essential nutrients it needs to stay healthy and perform its functions [9].
Patients with Diabetes Mellitus (DM) can be diagnosed by a doctor either manually or automatically. Advances in artificial intelligence (AI) and machine learning (ML) have increased the likelihood and efficacy of early illness detection and diagnosis compared to manual DM identification techniques [10]. Benefits include a reduction in the likelihood of human error and reduced work for medical personnel. Systems for computer-based decision support might be essential to effective treatment and precise diagnosis. Big data is produced by the DM field based on laboratory analysis, patient reports, treatment, medication, and follow-ups, among other things. Putting together all the required data by hand is difficult. The quality of the data organization has been damaged by inappropriate data management [11]. The goal of this work is to develop a hybrid model technique for classifying and recognizing diabetes. Fig 1 illustrates the process of detecting diabetes. In this study, a hybrid model for people with recognized diabetes is created using both machine learning and deep learning. Classifiers such as SVM, RF, etc. are trained to identify people with diabetes using a diabetes dataset. The deep learning algorithm R-BM for the recommender system subsequently receives this data and uses it to suggest a healthy diet to diabetes patients.
For this study, a hybrid strategy including two distinct models was created; the first one will identify whether a patient has diabetes. The methodology that follows will recommend a healthy diet for those with diabetes. Based on several risk factors, the system may determine a person’s likelihood of developing diabetes, provide physicians with an early diabetes diagnosis, and inform patients of their doctor’s advice for nutrition, exercise, and blood glucose monitoring. Finding the most significant feature in a dataset might be difficult. Given the contributions of several authors and researchers, even the best feature selection cannot guarantee significant quality, or 100% correctness [12]. Few researchers have developed a technique that accurately predicts instances using recurrent neural networks or deep learning.
The collaborative filtering strategy is used in this study. While collaborative filtering is a feasible approach, its ability to predict low numbers of likes is limited, which affects the ability to extract significant attributes. When developing deep learning models, restricted Boltzmann Machine (R-BM) approaches are essential since they yield more accurate results by hiding the details and exploiting quickly learned properties [13]. To address this problem, the optimal model parameters for root mean square error (RMSE) and recommended accuracy are selected. The following are a hybrid model’s primary contributions:
- This study offered a useful and optimal method for detecting diabetes in laboratory medical reports by identifying diabetic patients.
- This study integrates machine learning classifiers for diabetes detection with Restricted Boltzmann Machine algorithms to suggest dietary and physical activity plans for people with diabetes.
- Time and money can be saved by using the suggested strategy, which provides faster processing and computations.
- In comparison to the current models for diabetes detection and dietary guidance, the suggested method is more condensed and contains fewer parameters. Test datasets were used to investigate it.
Organization
The following outlines how this paper is organized: A related work is presented in the first section. Data screening, investigation, and data preprocessing are covered in the next section. Detailed work and discussion of the proposed hybrid model are given in the next section. The training section explains the training and testing process of the proposed model. The results of a hybrid model are presented in the next section. The study is concluded in the last section, with discussions of some future research.
Related work
Foreseeing diabetes is crucial for effective treatment to prevent the disease’s subsequent consequences. Studies on disease classification, diagnosis, prediction, and medicine have been done in great numbers. They have caused both conventional and machine learning techniques to advance and become noticeably more effective. For the classification of diseases, numerous ensemble methods and machine-learning algorithms have been employed [14]. Increased resistance to insulin in the body is a hallmark of type 2 diabetes, which means that the amount of insulin generated is insufficient to meet the body’s metabolic requirements. The most typical kind of diabetes in people is type 2 diabetes [15].
This study suggested a model for diabetes prediction using an artificial neural network (ANN) to help doctors and other medical professionals. The following parameters were used in this investigation: a woman’s age, the number of pregnancies, the Body Mass Index (BMI), heredity, height, and weight are all significant factors in determining whether or not a person develops diabetes [16]. The creation of these models is fraught with difficulties. T2D is a complex metabolic condition with a wide range of symptoms and concomitant diseases that are associated with it. For this condition to be controlled and for afflicted individuals to receive successful treatment, it is crucial to identify the key aspects. T2D has high development and medical expenditures, although numerous poorly understood risk factors exist. However, much development work has been done on classifying T2D using various computational methods [17].
This limits the applicability of models developed in high-income nations. Second, a community-based strategy drastically restricts the volume and complexity of data that community health professionals may gather. Because of this, many high-income country models rely on cutting-edge data [18]. It is impossible to overestimate the advantages of digital healthcare. Digital healthcare can improve accessibility, minimize costs, and improve the quality of health services while reducing service delivery inefficiencies. Primarily, digital healthcare may provide an inescapable and practical solution to the issues facing the global healthcare sector now facing [19].
Software tools and methods called recommender systems make suggestions to users. Recommendations can be for anything, including books, movies, news, and music tracks. The recommendation system generates suggestions by considering user interests and contextual data. Given the wealth of information available, it is imperative to eliminate useless information [20]. Web, app, and SNS platform traffic have increased dramatically due to the growth and adoption of the Internet and smart devices. Additionally, these platforms are gathering more and more diverse data that may be used to determine consumer preferences. Notably, the use of SNS platforms by users allows for the collection of a variety of data, including information on the user’s followers and their followers’ followers, tweet data, and user-uploaded content. Additionally, the development of wearable sensors connected to smart devices makes it easier to collect various user-related medical data and exercise-related data [21].
There has been extensive discussion of security and privacy issues in various areas of computer science in the literature. Numerous difficulties relating to the growth of RS research have been brought up, and several studies covering these topics have documented a few security and privacy issues for users [22]. The best that this study knows, there isn’t detailed work that carefully considers modern privacy and security issues. As a result, this paradigm offers a thorough investigation into a range of privacy and security issues, including trust, authentication, end-user privacy, damaging attacks, the fairness of RSs, their bias, their filter bubbles, and their ethics [22]. Healthcare informatics currently makes extensive use of machine learning (ML). Various crucial tools facilitate medical data analysis using ML [23]. Modern hospitals have the required monitoring and data-gathering technologies because data is incorporated into and stored in large information systems. According to researchers, the best method for assessing health-related data is machine learning (ML). In hospitals, proper diagnostic values must be loaded into a computer program with patient data to execute the learning algorithm. Clinical information is provided for an appropriate diagnosis. The past medical histories of treated patients automatically provide knowledge of the clinical diagnosis [24].
Preprocessing
From Kaggle, the dataset was downloaded as given in Table 1. The datasets provided by the United States Department of Agriculture are scattered and insufficient. The data must be in good condition to process and send to the restricted Boltzmann machine, a deep learning neural network approach. Diseases that contribute significantly to a massive rise in glucose levels can be brought on by a shortage or excess of compositional substances, as shown in Fig 2.
Four datasets were collected for the Deep Learning Neural Network as shown in Table 2. A critical factor in merging Food datasets is the Food ID. Other characteristics have been eliminated since they are unnecessary [25] as shown in Fig 3.
Food Nutrient is another table that contains details about food nutrients, including their quantity, ID, and name. Other significant components of this table include the Food ID and Nutrient ID fields. The final table is nutritional. It includes information such as each nutrient’s ID, name, and quantity. Using the preprocessing dataset technique, Branded Food is combined with the Food Dataset based on Food ID to create Merged Food. Any features that are not necessary are removed before combining the datasets. The Nutrient ID is also used to join the Food Nutrient and Nutrient databases to create the Merged dataset. Non-essential features are eliminated before combining the two datasets. Restricted Boltzmann Machine is then given the Final Food dataset based on the Global Food ID [25].
Fig 4 illustrates a hybrid model architecture where patients are first given diet and exercise recommendations using deep learning algorithms after their diabetes has been recognized using machine learning techniques. The human body needs specific components to function correctly. Two distinct datasets serve as inputs. The data is preprocessed, shown in Table 3, so the algorithm for calculating nutrition may use it. Preprocessing is completed to execute the nutrition calculation algorithm. The food inference algorithm requires the determination of the body’s nutritional status to function correctly and suggest the right foods. The method for calculating nutrition uses input parameters. The nutrition calculation algorithm provides the recommended daily intakes of compositional ingredients depending on gender. The estimated nutritional value is then forwarded to the, which offers suggestions for food and exercise [26].
The RBM receives the preprocessed food dataset and calculates the diabetic patient’s food intake. After ingesting the suggested foods, physical activity is also encouraged to ensure that the patient’s body is kept in good condition and that they can live a happy life.
Proposed hybrid model
Here is an illustration of a mathematical method that uses Random Forest to forecast diabetes [10]. Let Prob represent the likelihood that a patient has diabetes, and the input variables are X_1, X_2,…, X_n. Consequently, the Eq 1 can be used to represent the Random Forest model:
(1)
RF stands for the Random Forest model, combining several different decision trees. Based on the majority vote of the decision trees, each decision tree Ki in the RF contributes to the prediction through Eq 2:
(2)
The final probability is determined by Eq 3, which averages the probabilities of all the decision trees.
(3)
Each decision tree receives the input variables X_1, X_2,…, and X_n and produces a binary decision based on a threshold value at each node. The decision tree’s output is a binary classification of the patient as having diabetes. One of the main issues with C-filtering is the lack of data. As a result, those who have tasted the same dish are put together to train the RBM model. Each user group has its RBM. However, the amount of visible units fluctuates depending on how many everyday food items a group views or selects [10].
It is an apparent and concealed binary energy model as depicted in Fig 5. The weighted matrix W depicts the strength of connections between hidden layers Hj and hidden layers Vi. Bias weights (offsets) are added for these units. The effectiveness of the RBM model can be improved by including additional hidden units. Visible nodes include Food Items, but hidden nodes have features that can be utilized to determine correlations between connected neurons. Therefore, the number of Visible-Units and Group-Size are influenced by the food item’s attraction, as illustrated in Fig 6. RBMs determine the probability distribution for each food item before transferring it to hidden units that stand in for attributes. Scores for various scenarios will differ. This study graded numerous cases differently, as shown in Tables 4 and 5.
Some people might choose an inexpensive diet but detest the excessive protein in the score distribution. Conversely, Many patients could dislike eating affordable meals and choose highly-rated cuisine. As a result, several alternatives based on this study are implemented for various users. Positive weights are assigned to people associated with dislike, whereas negative consequences are given to people related to likes. The Model considers a user’s history, popularity, and similarity scores and displays the options that are most appealing to the user. The similarities and likes ratings S, P ϵ 0,1 help with training when their values are near one. “similarity” refers to how a user distributes their ratings [27].
The following Eq 4 is used to determine how similar the food item “i” and user “u” are:
(4)
The following Eq 5 is used to determine how popular food item “i” is:
(5)
The energy is defined by the Equation below 6:
(6)
where the formula for activation energy is given by 7:
(7)
W ij is the weight between i and j, and xjis the state of the unit (0 or 1). Fig 7 illustrates how groups of scenarios are created and rated according to the system’s energy. While some scenarios obtain favorable evaluations, some do not, as seen in Fig 8.
High scores will correspond to probabilities likely to occur, whereas low scores will correspond to unlikely possibilities. Situations with high scores have heavier weights, whereas those with low ratings have lighter weights. In the model, zeros must not be used since there are some situations where the scores will be zeros. To reduce the number of zeros that could result in calculations going wrong and boost processing speed, exponential functions will convert positive and negative values into integers. Following exponential (Eq 8) and partition (Eq 9) functions are used to prevent zero scores [28].
(8)
where k is a technique for a division that may be described as:
(9)
The probability of every scenario is the same if weights are 0. In this case, all options would have a probability of 1, which is inconsistent with data because different scenarios call for different probabilities. Final probabilities must demonstrate that some food items score highly and others score poorly depending on whether they are inexpensive, well-liked by other patients, or provide more nutrients for that user. Contrastive divergence with restricted Boltzmann machines will increase the chances of various outcomes. The dataset’s individual data points will all be evaluated. In the first instance, the scenarios involving highly rated food products are emphasized, which slightly increases the probabilities while lowering the likelihood of all other possibilities. The chance of each food item having the highest ratings is enhanced in the dataset’s following scenario, while the probability is decreased in all other cases. This process will be used for all datasets. As a result, foods are given high scores that assist in adjusting the weights. Eqs 10 and 13 are used to indicate the joint distribution of scores (V, H).
(10)
(11)
(12)
(13)
where σ is the activation function. The log-likelihood for training is found in the Eqs 14–16.
(14)
(15)
(16)
where the rate of learning is ϵ. The number of times that the features ith and jth of a food item are combined in user ratings is calculated using joint distribution [29]. This study employs the most often used gradient estimation, contrastive divergence [30]. The expected ratings for food item q used for model testing are represented by the Eqs 17 and 18.
(17)
(18)
There will be several calculations that call for more processing power. Gibbs sampling is a different restricted Boltzmann machine technique that has been discovered. By selecting data points randomly, Gibbs sampling can raise the likelihood of one particular event while lowering the likelihood of all other outcomes, as shown in Fig 9. A test with excellent results is chosen randomly, increasing the likelihood of just that specimen. It is less likely that a sample with low scores will be the only one chosen at random.
This study uses a learning rate to lower the error rate. The training and testing of the sigmoid function with different numbers of Hidden Units are carried out. Having 20 concealed units yields the best results, as shown in the following graph. Experiments with a lot of concealed units have a larger RMSE. When updating sub-weight matrices, divide the learned gradient Eqs 19 and 20 by the group size to prevent learning rate volatility caused by grouping size [29]. The best outcomes are attained with a learning rate of 0.01. However, performance improves at 0.05 when learned weights from pre-trained models are used. This study uses two activation functions: [31].
(19)
(20)
The root means square error (RMSE), accuracy, responsiveness (recall), and specificity are some of the metrics used to assess performance [32]. The RMSE in Eq 21 measures the discrepancy between estimated and actual values across samples:
(21)
where N is the size of the test set and Yi −
is the residual difference between the base and prediction values.
Hybrid model training & testing
Fig 10 depicts a hybrid model workflow where the presence of diabetes in a patient is initially determined. If a patient has diabetes, a hybrid model feeds input data into the deep learning algorithm RBM for dietary and exercise advice. Table 6 displays an experimental setup with various parameters and their values for a hybrid model. The first phase of training an RBM with multiple parameters is shown. The number of nodes selected will correspond to the no. of rows in the weight matrix, while the number of hidden nodes will correspond to the number of columns. The first hidden node will get the vector multiplication of the inputs by the first column of weights before the matching bias term is added [33]. The RBMs communicate their learned consequences. Each RBM takes into account a sub-weight matrix. If a food item is not visible or selected, it is excluded from the weight matrix update. Fig 10 has been redrawn from the paper [7] to illustrate workflow for the hybrid model. It can be seen from Fig 10, that the model includes food items based on price and nutrition. The similarity and popularity of food products are defined by adding two additional layers, S and P. They display how well-liked particular foods are on menus where they appear and in earlier behaviors of similar individuals.
The visible levels in this model are unrelated, but the hidden layer (H) and the visible levels (V, S, and P) are entirely interconnected. Inputs are produced and rebuilt during the backward pass by translating numbers. The replication of the input data is taught to an RBM through numerous forward and backward passes [34].
An RBM is a feature extractor neural net family member, which is all about identifying data patterns. In training neural networks, one crucial hyperparameter is the learning rate. It establishes the increment size for updating the neural network’s weights during training. The learning rate can significantly impact the neural network’s performance and convergence. The dataset being utilized and the problem being addressed determine the learning rate. Finding the ideal settings frequently requires testing out various learning rates. Strategies such as learning rate scheduling and adaptive learning rates can dynamically change the learning rate during training.
Results
There are two stages to this research. First, this study determines whether a person has diabetes or not. This study suggests a diet and exercise recommender system for diabetic individuals at the following stage. The first stage of this work uses machine learning techniques to create a rapid and accurate approach for detecting diabetes mellitus, which is essential for human health. If diabetes is not treated at the right time, it can lead to heart disease, blindness, stroke, renal failure, sexual dysfunction, lower-limb amputation, and issues during pregnancy in women.
Those who are overweight, physically inactive, or have a family history of the disease are at an increased risk of developing diabetes. It is critical to recognize diabetes in its early stages. Early diabetes mellitus diagnosis demands a new approach from earlier approaches. As a result, this study used a PIDD dataset and ensemble techniques such as Support Vector Machine, Gradient Boosting, and Random Forest.
Table 7 shows the results obtained from machine learning classifiers. RF is the best choice in this research when utilizing criteria like Accuracy, F1 Score, Recall, and Precision to evaluate classifiers. With the help of the PIDD dataset, many machine-learning classifiers were examined in this study, and Random Forest was chosen as the best classifier for detecting diabetes. The study uses deep learning to create a diet and exercise recommendation system for diabetic patients in the second stage, which involves identifying a patient as having diabetes. As the number of iterations rises, the model on the food dataset gets better—first without (V, S, P, and H layers), and then with (V, S, P, and H layers). We observed volatility in Root Mean Square Value (RMSE) starting at 0.70 and steadily decreasing to 0.63 after 50 epochs of iterations on our model with 50 hidden nodes and a learning rate of 0.01. We might need to try different combinations to get recommendations because this method does not provide us enough errors.
Our model underwent iterations with 200 hidden nodes and a learning rate of 0.01; then, after 50 epochs, the RMSE fluctuated, peaking at 0.68. If our approach does not provide us with a sufficient error as shown in Fig 11, we tried different combinations to get recommendations. [7]. Our model underwent iterations with 200 hidden nodes and a learning rate of 0.01; then, after 50 epochs, the RMSE fluctuated, peaking at 0.68. If our approach does not provide us with a sufficient mistake, we might have to try different combinations to get recommendations. This study employs a deep learning method with visible and hidden layers, followed by similarity, popularity, and visual and hidden layers. Both RBM models are trained and tested using the logistic sigmoid function, 100 hidden units, and a learning rate of 0.05. The R-BM model with Similarity S and Popularity P yields superior results because the layers (S and P) give the hidden layer additional information about the ratings via similarity and popularity scores. After performing iterations on the model, the volatility in RMSE Value started at 0.70 and decreased to 0.63, with less variance in RMSE after 50 epochs.
With 200 hidden nodes and a 0.01 learning rate, the model performs iterations, and the RMSE fluctuated throughout, peaking at 0.68 before dropping to 0.50 after 50 epochs. This study tested several combinations as shown in Fig 12 to produce recommendations if this strategy did not produce enough error. As a result, the error was reduced with various setups. The RMSE of the model with V, S, P, and H layers reduces to 0.50 after 50 epochs with 100 hidden nodes, as shown in Fig 13, and continues to decrease until it reaches 0.25, which the desired outcome for the suggested model. Based on pathology data, we then utilize the model’s output to suggest food and physical activity to diabetes patients. We format our recommendations for patients to see using OS, SYS, and colored libraries. The daily nutritional components for both males and females make up our dataset. We will enter the pathology results and use the daily intake to determine the necessary intake for the patients. The patient will be asked to enter their credentials before we provide recommended daily intakes, enabling them to determine how much daily consumption of food items containing nutritional elements they require.
Based on prediction accuracy, the proposed model creates a list of food items suggested for a user. This research provides a diet and exercise recommendation system that considers various input parameters [35]. The first pieces of data the recommender system needs are the patient’s age, number, and pathology reports. Pathology data are used to determine the required nutritional value, and the patient is then asked to rate the food item based on user reviews. The final essential nutrition choice is then presented to the patient, and a pricing range is sought based on the suggested meals. Once processing has started, the algorithm will use the RBM reconstruction technique to provide the patient with a range of food options that will provide the necessary nutrients at a reasonable price [36]. After taking the necessary nutrients, the patient must engage in those physical activities to maintain exceptional health, a healthy lifestyle, and a physical appearance. The patient contributes to making society a healthier place to live [37]. Based on user preferences, the suggested model suggests 100 worth of food goods for each patient. There are some estimated diet and activity recommendations. Between 90 and 99 percent, there are about 40 food items per patient, and so on. The model’s output is then kept in an Excel file. Tables 8 & 9 show the overall physical activity and diet recommendations as well as the recommendations for each patient.
After importing the suggested Excel file from the system and inputting input parameters, the inference algorithm will show food products based on reports and user preferences. This study presents the patient with advised food options, nutrient content, exercise, user preferences, and pricing. Finally, the suggested approach will show diet plans following user preferences and dietary constraints, but the inference technique will only display 10 food products with 100% suggestions. These patient preferences for choosing meals based on nutrition are then included in the preprocessed dataset for further use. The model generates meal suggestions using a larger dataset in a cyclical process. Then, this study uses those suggestions to make recommendations for both food and exercise.
Discussion
To anticipate diabetes, this paper proposed classification models that can be used with electronic diagnostic devices implemented in hospitals. Using eight specified variables from the PIMA Indians dataset, the models were trained using three machine-learning techniques and assessed to determine whether a subject’s diabetes mellitus diagnosis was positive. The experimental findings demonstrate that, when compared to the [38], the random forest classifier performed better on the entire Pima Indian Diabetes dataset than the SVM and other classifiers in terms of accuracy metric (97%), precision (96%), f-score (95%), and recall (95%). On the other hand, the random forest classification model beat the SVM and KNN models in terms of accuracy for the subsets that employed feature selection. On the PIMA Indians dataset, the KNN classifier outperformed the SVM model with an accuracy of 78% as opposed to 96%, which was the highest accuracy in this experiment. We may conclude that whilst a random forest performs better with more features, a random forest model performs better with a more precise feature selection for binary classification but struggles with many correlated features. Even though the models used in this experiment have accuracy levels close to 90% except for random forest which is 96%, our research’s findings are consistent with earlier studies by [39, 40] and have room for improvement. The fact that our method did not exhibit overfitting was encouraging. Stated differently, the outcomes are more authentic and near to reality. The current guidelines indicate that age and weight (indicated by BMI and skin fold thickness) are significant factors in the diagnosis and occurrence of diabetes mellitus.
Presently, recommender systems that rely on predictions are crucial in anticipating user behavior in their social interactions. It is difficult to forecast a user’s behavior because of privacy issues and the scarcity of navigation records. To forecast and suggest diet and physical exercise to a diabetic user, we propose in this work a diet recommender model with joint distribution conditional on similarity and popularity scores [41]. Pre-training with the sample subset allowed for the identification of the ideal model parameters. According to experiments, 100 hidden units and a learning rate of 0.05 yield reasonably decent results. For the hidden layer, the sigmoid function yields the best accuracy. Because of the similarity and popularity scores, the hybrid model performs better than the clusters-based RBM models in terms of RMSE and accuracy, even if the PIMA dataset contains sparse data.
Conclusion
Early detection of both DMs will make organizing prompt interventions easier and raise awareness of the disease’s risk. After the detection of diabetes, the hybrid model recommends diet and exercise. A healthy diet is essential for those with a range of disorders. This paper describes a recommender system for diet and exercise that may provide customers with customized, healthy nutrition advice based on their preferences and pathological medical data. This study demonstrates how ratings of food products can be described using the Restricted-Boltzmann Machine. With more than 100 million in user-rated items, this study also indicates the practicality of employing RBMs to collect food data. The proposed research produces an error rate of less than 0.30%in50 epochs using 100 hidden nodes. Instead of taking medication to avoid an expensive trip to the doctor, this study allows patients to eat nutrient-rich foods and practice preventive medicine. In the future, the model will be trained using several deep-learning methods. In the future, a user-friendly mobile and web interface will be offered to diabetic patients. This study aims to provide improvements on smart devices to give diabetic patients real-time dietary advice, leading to a healthier society.
References
- 1. Kaur H, Kumari V. Predictive modelling and analytics for diabetes using a machine learning approach. Applied Computing and Informatics. 2022;18(1-2):90–100.
- 2.
WHO. Global Action Plan for the Prevention and Control of NCDs 2013-2020; 2023. Available from: https://www.who.int/publications/i/item/9789241506236.
- 3. Alex SA, Nayahi JJV, Shine H, Gopirekha V. Deep convolutional neural network for diabetes mellitus prediction. Neural Computing and Applications. 2022;34(2):1319–1327.
- 4. Ahmed U, Issa GF, Khan MA, Aftab S, Khan MF, Said RAT, et al. Prediction of Diabetes Empowered With Fused Machine Learning. IEEE Access. 2022;10:8529–8538.
- 5. Chang V, Bailey J, Xu QA, Sun Z. Pima Indians diabetes mellitus classification based on machine learning (ML) algorithms. Neural Computing and Applications. 2022. pmid:35345556
- 6.
Singh S, Vazirani V. Classification vs Clustering: Ways for Diabetes Detection. 2022 IEEE 7th International conference for Convergence in Technology, I2CT 2022. 2022.
- 7. Sajid M, Aslam N, Abid MK, Fuzail M. RDED: Recommendation of Diet and Exercise for Diabetes Patients using Restricted Boltzmann Machine. VFAST Transactions on Software Engineering. 2022; p. 37–55.
- 8. Tabassum N, Rehman A, Hamid M, Saleem M, Malik S, Alyas T. Intelligent nutrition diet recommender system for diabetic’s patients. Intelligent Automation and Soft Computing. 2021;30(1):319–335.
- 9. Rita L. Building a food recommendation system: Machine Learning to prevent and treat cancer through nutrition. Towards Data Science. 2020;.
- 10. Chaki J, Ganesh ST, Cidham SK. Machine learning and artificial intelligence based diabetes mellitus detection and self-management: a systematic review. Journal of King Saud …. 2020;.
- 11. Tuppad A, Patil SD. Machine learning for diabetes clinical decision support: a review. Advances in Computational Intelligence. 2022;2(2). pmid:35434723
- 12. Laila Ue, Mahboob K, Khan AW, Khan F, Taekeun W. An Ensemble Approach to Predict Early-Stage Diabetes Risk Using Machine Learning: An Empirical Study. Sensors. 2022;22(14).
- 13.
Pal S, Mishra N, Bhushan M, Kholiya PS, Rana M, Negi A. Deep Learning Techniques for Prediction and Diagnosis of Diabetes Mellitus. In: 2022 International Mobile and Embedded Technology Conference, MECON 2022; 2022. p. 588–593.
- 14. Vosta S, Yow KC. A CNN-RNN Combined Structure for Real-World Violence Detection in Surveillance Cameras. Applied Sciences (Switzerland). 2022;12(3):1021.
- 15. Elias D, Maria T. Data-Driven Machine-Learning Methods for Diabetes Risk Prediction. Sensors. 2022;22:5304.
- 16. Jader R, Aminifar S. Fast and Accurate Artificial Neural Network Model for Diabetes Recogni-tion. NeuroQuantology. 2022;August(1).
- 17. Howlader KC, Satu MS, Awal MA, Islam MR, Islam SMS, Quinn JMW, et al. Machine learning models for classification and identification of significant attributes to detect type 2 diabetes. Health Information Science and Systems. 2022;10(1). pmid:35178244
- 18. Boutilier JJ, Chan TCY, Ranjan M, Deo S. Risk Stratification for Early Detection of Diabetes and Hypertension in Resource-Limited Settings: Machine Learning Analysis. Journal of Medical Internet Research. 2021;23(1). pmid:33475518
- 19. Rezaei M, Jafari-Sadeghi V, Cao D, Mahdiraji HA. Key indicators of ethical challenges in digital healthcare: A combined Delphi exploration and confirmative factor analysis approach with evidence from Khorasan province in Iran. Technological Forecasting and Social Change. 2021;167.
- 20. Javed U, Shaukat K, Hameed IA, Iqbal F, Alam TM, Luo S. A Review of Content-Based and Context-Based Recommendation Systems. International Journal of Emerging Technologies in Learning. 2021;16(3):274–306.
- 21. Ko H, Lee S, Park Y, Choi A. A Survey of Recommendation Systems: Recommendation Models, Techniques, and Application Fields. Electronics (Switzerland). 2022;11(1).
- 22. Himeur Y, Sohail SS, Bensaali F, Amira A, Alazab M. Latest trends of security and privacy in recommender systems: A comprehensive review and future perspectives. Computers and Security. 2022;118.
- 23.
Pavan Kumar I, Mahaveerakannan R, Praveen Kumar K, Basu I, Anil Kumar TC, Choche M. A Design of Disease Diagnosis based Smart Healthcare Model using Deep Learning Technique. In: Proceedings of the International Conference on Electronics and Renewable Systems, ICEARS 2022; 2022. p. 1444–1449.
- 24. Waghade S S, & Karandikar AM. A comprehensive study of healthcare fraud detection based on machine learning. International Journal of Applied Engineering Research. 2018;13(6):4175–4178.
- 25.
USDA. Diabetes Dataset; 2023. Available from: https://fdc.nal.usda.gov/download-datasets.html.
- 26.
Goyal K. Data Preprocessing in Machine Learning; 2023. Available from: https://www.upgrad.com/blog/data-preprocessing-in-machine-learning/.
- 27.
Nguyen HV, Bai L. Cosine similarity metric learning for face verification. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 2011. p. 709–720.
- 28.
Yedder HB, Zakia U, Ahmed A, Trajković L. Modeling prediction in recommender systems using restricted boltzmann machine. In: 2017 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2017. vol. 2017-Janua; 2017. p. 2063–2068.
- 29. Hinton GE. A practical guide to training restricted boltzmann machines. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2012;7700 LECTU:599–619.
- 30.
Salakhutdinov R, Mnih A, Hinton G. Restricted Boltzmann machines for collaborative filtering. In: ACM International Conference Proceeding Series. vol. 227; 2007. p. 791–798.
- 31.
Bishop CM, Nasrabadi NM. Pattern recognition and machine learning. 4. Springer; 2006.
- 32. Gunawardana A, Shani G. A survey of accuracy evaluation metrics of recommendation tasks. Journal of Machine Learning Research. 2009;10:2935–2962.
- 33.
Sharma A. Restricted Boltzmann Algorithm; 2023. Available from: https://towardsdatascience.com/restricted-boltzmann-machines-simplified-eab1e5878976.
- 34.
Tracyrenee. Restricted Boltzmann Working Principle; 2023. Available from: https://medium.datadriveninvestor.com/an-intuitive-introduction-of-restricted-boltzmann-machine-rbm.
- 35. Polamuri S. Introduction to Recommendation Engine. Oracle. 2020; p. 1–7.
- 36. Cheung KL, Durusu D, Sui X, de Vries H. How recommender systems could support and enhance computer-tailored digital health programs: A scoping review. Digital Health. 2019;5. pmid:30800414
- 37.
Heather Grey MB. 10 Exercises for Diabetes: Walking, Yoga, Swimming, and More; 2023. Available from: https://www.healthline.com/health/type-2-diabetes/top-exercises.
- 38. Chang V, Bailey J, Xu QA, Sun Z. Pima Indians diabetes mellitus classification based on machine learning (ML) algorithms. Neural Computing and Applications. 2023;35:16157–16173.
- 39.
Iyer A, Jeyalatha S, Sumbaly R. Diagnosis of diabetes using classification mining techniques. arXiv preprint arXiv:150203774. 2015;.
- 40. Cheng D, Ting C, Ho C, Ho C. Performance evaluation of explainable machine learning on non-communicable diseases. Solid State Technol. 2020;63:2780–2793.
- 41. Yang L, Hsieh CK, Yang H, Pollak JP, Dell N, Belongie S, et al. Yum-me: a personalized nutrient-based meal recommender system. ACM Transactions on Information Systems (TOIS). 2017;36:1–31. pmid:30464375