Figures
Abstract
Background: Cardiovascular diseases (CVD) are one of the leading global causes of death, which requires an accurate early prediction. This study aimed to develop transparent machine learning (ML) models using National Health and Nutrition Examination Survey (NHANES) data from 2017–2023 to predict CVD risk based on dietary and health factors.
Methods: We analyzed data from 12,382 adults (aged 18 and older) from NHANES 2017–2023, including 41 dietary, anthropometric, clinical, and demographic variables. Recursive Feature Elimination (RFE) was used to select an optimal subset of 30 predictors. To address substantial class imbalance in the outcome, we applied the Random Over-Sampling Examples (ROSE) technique to the training data. Five machine learning models—Logistic Regression, Random Forest, Support Vector Machines, XGBoost, and LightGBM—were trained and evaluated. Model interpretability was assessed using LIME and SHAP.
Results: Participants with CVD differed significantly from those without CVD in age, waist circumference, systolic blood pressure, C-reactive protein (CRP), and multiple dietary nutrients, with a consistently lower nutrient intake in the CVD group. Among the ML models evaluated, XGBoost achieved the highest accuracy (0.8216) and recall (0.8645), while Random Forest showed the highest AUROC (0.8139). Interpretability analyses identified age as the strongest predictor, followed by vitamin B12, total cholesterol, CRP, and waist circumference.
Conclusion: Interpretable ML models effectively identified key dietary and clinical factors for CVD risk. Nutrients like vitamin B12 and niacin, alongside established clinical indicators, emerged as significant predictors, underscoring their potential role in nutritional interventions and public health strategies for CVD prevention.
Citation: Ahiduzzaman M, Hasan MN (2025) Interpretable machine learning for cardiovascular risk prediction: Insights from NHANES dietary and health data. PLoS One 20(11): e0335915. https://doi.org/10.1371/journal.pone.0335915
Editor: Li-Da Wu, Nanjing First Hospital, Nanjing Medical University, CHINA
Received: June 10, 2025; Accepted: October 17, 2025; Published: November 6, 2025
Copyright: © 2025 Ahiduzzaman, Nahid Hasan. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data used in this study are publicly available from the National Health and Nutrition Examination Survey (NHANES) website at https://www.cdc.gov/nchs/nhanes. The raw data is also available at (https://www.kaggle.com/datasets/ahiduzzaman28/nhanes-cvd-raw-data-2017-23).
Funding: The author(s) received no specific funding for this work.
Competing interests: No authors have competing interests.
Introduction
Cardiovascular disease (CVD) refers to a group of disorders that affect the heart and blood vessels, including coronary artery disease, cerebrovascular disease, and peripheral arterial disease. These conditions can impair the cardiovascular system’s ability to circulate blood efficiently and regulate vascular function, often leading to serious health outcomes such as coronary heart disease, congestive heart failure, stroke, or premature death. Due to its complex and multifactorial nature—driven by behavioral, metabolic, environmental, and genetic factors—CVD remains a leading cause of morbidity and mortality worldwide. In recent years, machine learning (ML) approaches have gained momentum in CVD research for their capacity to analyze high-dimensional health data and improve early risk prediction and classification.
Several studies have applied ML to predict CVD and related outcomes using integrated health data. Recently, Mao et al. [1] reviewed the application of the ML algorithm in the diagnosis of heart diseases. Klados et al. [2] developed a machine learning framework to predict CVD based on physical functioning and health status, while Ngamdu et al. [3] found that advanced periodontal disease was strongly associated with higher CVD risk. Dinh et al. [4] employed ensemble models to identify key predictors of CVD and diabetes. More broadly, ML has shown promise in modeling a variety of health outcomes beyond cardiovascular disease. For instance, recent studies have applied ML techniques to predict conditions such as asthma and insomnia [5,6]. These findings highlight the capacity of ML to uncover complex, multidomain relationships in population health—particularly when applied to large-scale, nationally representative datasets that include clinical, dietary, and biomarker information.
One such resource is the National Health and Nutrition Examination Survey (NHANES), a nationally representative dataset widely used to study cardiovascular disease, particularly in identifying and analyzing its risk factors. Prior studies have highlighted the predictive value of various biomarkers and exposures, such as the FT3/FT4 thyroid hormone ratio [7], heavy metals like cadmium and lead [8], and trends in hypertension and lipid treatment across obesity levels [9]. Other work has linked micronutrients like vitamin D and cryptoxanthin to reduced inflammation and oxidative stress [10,11], while caffeine intake has shown mixed associations depending on dosage [12]. NHANES has also supported research on comorbidities and disparities, including the link between COPD and CVD [13], racial/ethnic differences in cardiometabolic outcomes [14], and the role of dietary data [15].
However, most studies to date have often examined individual risk factors independently, without fully utilizing the diverse types of health information available across domains. To address this gap, we developed an integrated ML framework using the most recent NHANES data from 2017 to 2023. We incorporate macro- and micronutrient intake, demographics, laboratory biomarkers, anthropometric measures, and clinical examination data to predict CVD risk. Multiple ML classifiers—including logistic regression (LR), random forests (RF), support vector machines (SVM), XGBoost, and LightGBM—are evaluated to identify optimal performance. We further apply SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) to enhance model transparency and provide insights into the most influential predictors.
This research contributes to the growing field of data-driven CVD prevention by demonstrating how machine learning can effectively integrate diverse health data while providing interpretable results that could inform targeted public health interventions.
Materials and methods
Data source and study population
This study utilized data from the National Health and Nutrition Examination Survey (NHANES) spanning 2017–2023, encompassing both pre-pandemic (2017–2020) and post-pandemic (2021–2023) periods [16]. NHANES is a nationally representative survey conducted by the National Center for Health Statistics that combines interviews, physical examinations, and laboratory tests to assess the health and nutritional status of the U.S. population.
The analysis included 12,382 adults aged 18 years and older after data preprocessing. We integrated multiple NHANES components: dietary intake data, demographic information, anthropometric measurements, clinical examinations, laboratory biomarkers, and self-reported medical conditions.
Outcome variable
The primary outcome was a binary indicator of cardiovascular disease (CVD) based on self-reported physician diagnoses. Participants were classified as having CVD (coded as 1) if they reported ever being diagnosed with angina, congestive heart failure, coronary heart disease, heart attack, or stroke. These conditions are recognized as key indicators of cardiovascular disease [17]. Those reporting none of these conditions were classified as CVD-free (coded as 0). The classification process is summarized in Table 1.
The co-occurrence matrix in Fig 1 illustrates the frequency with which different cardiovascular diseases occur together within the dataset, providing insights into potential comorbidities and patterns of disease co-existence among individuals.
Predictor variables
In this study, we selected 41 features across five key domains: Dietary nutrients: Macronutrients (protein, carbohydrates, fats), vitamins (B6, B12, C, D, E, K, thiamin, riboflavin, niacin, folate), minerals (calcium, iron, magnesium, zinc, copper, potassium, sodium, selenium), and other compounds (beta-carotene, lutein, fiber); Anthropometric measures: Body Mass Index (BMI) and waist circumference; Clinical markers: Systolic and diastolic blood pressure; Laboratory biomarkers: Total cholesterol and C-reactive protein; and Demographic factors: Age.
Data preprocessing
We implemented a systematic strategy to address missing data, perform feature selection, and handle class imbalance prior to model development. The preprocessing workflow is illustrated in Fig 2.
Initially, we addressed missing data through a two-stage approach. First, we performed complete-case analysis for dietary variables and the outcome-CVD. We observed uniform missingness in dietary variables, which is approximately 30.6%. Second, for remaining variables with lower missingness rates (1.14%–7.13%), we applied Multiple Imputation by Chained Equations (MICE) [18] using Predictive Mean Matching, generating five imputed datasets with 50 iterations each.
Next, we applied feature selection to enhance model accuracy and interpretability. Recursive Feature Elimination (RFE) with random forest as the base estimator was used, incorporating 5-fold cross-validation repeated five times. This process identified 30 optimal features that contributed most to predictive performance. The selected variables covered all original domains and included important predictors such as age, total cholesterol, waist circumference, blood pressure, and key vitamins and minerals. A justification plot is given in the result section Fig 3.
The final analytical dataset comprised 12,382 participants with 30 carefully selected features, providing a robust foundation for cardiovascular disease prediction modeling. Details is shown in Table 2.
Modeling approach
The final analytical dataset was first split into training (80%) and testing (20%) subsets. Due to the substantial class imbalance in the outcome variable (with 87.7% (10,864) non-CVD cases 12.3% (1,518) CVD cases ), we applied the Random Over-Sampling Examples (ROSE) technique [19] to the training data, keeping the test set untouched. This approach synthetically balanced the classes, resulting in a training set with 4,592 CVD and 4,695 non-CVD cases. Using this balanced training set, we trained five machine learning classifiers: Logistic Regression (LR) [20], Random Forests (RF) [21], Support Vector Machines (SVM) [22], Extreme Gradient Boosting (XGBoost) [23], and Light Gradient Boosting Machine (LightGBM) [24]. Finally, model performance was evaluated on the test set using metrics including Accuracy, Precision, Recall, Specificity, F1 Score, and Area Under the ROC Curve (AUROC).
To interpret contributions of each feature to the model, we employed Local Interpretable Model-agnostic Explanations (LIME) [25] and SHapley Additive exPlanations (SHAP) [26]. LIME provided insights into individual predictions, while SHAP analyses facilitated a detailed decomposition of global and local feature influences, enhancing both interpretability and model transparency. SHAP visualizations were generated to intuitively demonstrate the relative importance of features on model predictions.
Model specifications
To ensure reproducibility, we report the exact settings used for each classifier. We did not run an extensive hyperparameter search; instead we used literature-guided defaults with early stopping where applicable, applied to the training split only (ROSE-balanced; the test set was untouched). The SVM (e1071) used a radial basis function (RBF) kernel with cost and
(package default; p = 30 predictors after RFE), with probability=TRUE to obtain class probabilities. XGBoost (xgboost) was fit with objective=binary:logistic, eval_metric=auc, max_depth=8, eta=0.1, subsample=0.8, colsample_bytree=0.8, min_child_weight=1, scale_pos_weight=1, up to 1000 rounds with early stopping at 50 rounds. Random Forest (randomForest) used ntree=500 with default
. Logistic regression (stats::glm) used a binomial logit link with all selected predictors. LightGBM (lightgbm) was trained with objective=binary, metric=auc, learning_rate=0.01, num_leaves=31, and scale_pos_weight set to the inverse class ratio in the training split, with early stopping at 50 rounds. All models used the same 30-feature set selected by RFE and fixed random seeds.
Ethics statement
The data used in this study were obtained from the National Health and Nutrition Examination Survey (NHANES), a publicly available dataset provided by the National Center for Health Statistics (NCHS). NHANES is conducted in compliance with the ethical guidelines established by the NCHS, and all participants provided informed consent at the time of data collection. This study utilized de-identified publicly available data. The analysis adhered to all applicable ethical standards for research using secondary data.
Results
Descriptive statistics
Table 2 summarizes the dietary, demographic, anthropometric, and clinical characteristics of the study population, stratified by CVD status. Significant group differences (with p-values <0.05) were found in key demographic and clinical measures, including age, waist circumference, systolic blood pressure, and C-reactive protein. Participants diagnosed with CVD (Class 1) showed significantly lower mean dietary intake of nutrients, including protein, carbohydrates, sugars, fiber, vitamins B6, C, K, iron, zinc, and magnesium compared to participants without CVD (Class 0). However, no significant differences were observed for cryptoxanthin (p-value = 0.30) and vitamin B12 (p-value = 0.092). The RFE plot in Fig 3 shows the performance stabilized after 15 predictors and peaked at 30 predictors with CV accuracy .We chose 30 features because the RFE curve peaked at 30 and this subset retains most of the nutrient profile alongside key clinical CVD markers—whereas a 15-feature subset would discard many nutrition variables central to our research question.
Performance of machine learning models
Table 3 summarizes the performance of five ML models evaluated. XGBoost offers the best balance of accuracy (0.8216) and recall (0.8645), which is essential for detecting high-risk individuals. LR provides the highest specificity (0.7810) and precision (0.9576), making it particularly effective at minimizing false-positive predictions. RF (0.8790) and LightGBM (0.8883) achieved strong F1-scores, reflecting balanced performance between precision and recall. Overall, the results suggest that ensemble-based models—particularly XGBoost and LightGBM—are well-suited for capturing complex patterns in the data and offer strong predictive performance for CVD risk classification.
Fig 4 compares the ROC curves for all classifiers and illustrates their relative ability to distinguish between CVD and non-CVD cases. RF achieved the highest AUROC (0.814), with XGBoost (0.808), LR and LightGBM (both 0.807), and SVM (0.798) showing comparable but slightly lower performance.
Feature importance and model interpretability
LIME and SHAP were used to assess feature importance and model interpretability, as shown in Fig 5 and Fig 6, respectively. LIME analyzes of individual predictions frequently highlighted age, vitamin B12, total cholesterol, C-reactive protein (CRP), waist circumference and niacin as key factors that influence predictions, with specific levels of nutrients influencing risk estimates.
SHAP analysis provided a comprehensive view of feature contributions across the dataset. Higher values of age and total cholesterol were consistently correlated with increased SHAP values, indicating higher predicted cardiovascular risk. In addition to clinical factors, dietary factors such as vitamin B12 and niacin also contribute significantly to the prediction of cardiovascular risk. These insights underscore the potential value of incorporating detailed nutrient-level data into predictive models for the assessment of CVD risk.
Interestingly, The SHAP beeswarm plot shows for the Age and Total cholesterol there might be a interaction effect. We have therefore assessed the assumption with a SHAP dependency plot. Fig 7 shows how Total_Cholesterol relates to the model’s predicted CVD risk, with points colored by Age. Each point is one participant. The vertical axis shows whether cholesterol moved the prediction up (positive values) or down (negative values). Two simple patterns appear: (i) among older adults with low cholesterol, cholesterol often pushes risk up; and (ii) above about 180–200 mg/dL, cholesterol adds little extra signal once other strong factors (e.g., age, waist size, blood pressure, CRP) are already considered. This should not be read as “high cholesterol is protective”—it only means that its added contribution is small in those regions. The far right tail has few points, so those extremes should be interpreted cautiously. Medication use was not available and may influence the low-cholesterol/older cluster.
Discussion
In this study, we developed and evaluated five ML models to predict a composite CVD outcome using the nationally representative dataset from NHANES 2017–2023. Our results indicated that the ensemble models, specifically XGBoost, achieved the highest accuracy (0.8216) and recall (0.8645), while RF demonstrated the best overall discriminative performance with an AUROC of 0.8139.
The overall performance of XGBoost and RF of this study aligns with findings from broader literature. In a meta-analysis, Krittanawong et al. [27], indicated that boosting algorithms were promising for coronary artery disease prediction. In another study, Azmi et al. [28] and Naser et al. [29] also frequently reported RF as a high-performing algorithm in CVD prediction. However, a direct comparison of the performance of the ML models should be made with caution due to differences in the datasets and the definition of the outcome variable [30,31]. Our results remain robust and appropriate for the complexity of a large nationally representative survey.
In addition, this study demonstrates that integrating detailed dietary nutrient profiles with demographic and clinical variables can enhance the prediction of CVD risk. Interpretability analyses using LIME and SHAP confirmed the importance of both clinical and nutritional features—particularly age, CRP, total cholesterol, and vitamin B12—in shaping CVD risk predictions. For instance, SHAP values indicated how varying levels of these features, including specific nutrients, shifted the predicted risk. This in-depth perspective on the role of individual nutrients is a key outcome, as many studies on CVD prediction tend to focus on broader dietary patterns or a more limited set of biochemical markers rather than an extensive profile of 26 dietary nutrients as explored here [28,30,31]. Our results highlight the importance of nutrients such as vitamin B12 and niacin in the risk of cardiovascular disease, aligning with recent findings by Yang et al. [32]. Further, demonstrating the value of interpretable machine learning methods to uncover specific associations between nutrient diseases.
A distinct strength of this research, beyond the detailed nutrient analysis, is the use of NHANES data from 2017–2023, which uniquely includes the post-pandemic period, a time frame largely unaddressed in the cited comparative literature [5,6,27,28]. This allows our findings to reflect potentially newer trends in population health and nutrition. The methodological rigor, including systematic handling of missing data (MICE), addressing class imbalance (ROSE)—a challenge also noted by Ogunpola et al. [30] and Naser et al. [29] —and robust feature selection using Recursive Feature Elimination (RFE) further strengthens our study’s validity. At the global level, the SHAP summary plot showed that higher Age, larger Waist_circ, and elevated blood pressure increased predicted CVD risk, whereas lower values reduced it. Total_Cholesterol displayed a non-linear, age-dependent pattern: very low values in older adults tended to raise predicted risk, while values above ∼180–200 mg/dL added little incremental signal once age, adiposity, blood pressure, and CRP were accounted for. LIME panels for borderline test cases clarified which features pushed an individual prediction up (blue) or down (red), providing person-level transparency.
From a public-health perspective, these patterns emphasize modifiable targets—central adiposity, blood pressure, and inflammation—alongside specific nutrient signals (e.g., vitamin B12, niacin). In practice, risk estimates could help prioritize counseling and hypertension control for higher-burden subgroups (e.g., older adults and those with central obesity); findings are associative rather than causal.
CVD status was derived from self-reported, physician-diagnosed conditions. While this approach is standard in population surveillance, it is subject to recall error and misclassification. We anticipate that any non-differential misclassification would reduce associations.
Future research should aim to validate these findings using longitudinal data to better explore causal pathways, especially for identified nutritional predictors. Incorporating objectively measured CVD outcomes and further investigating the specific impact of the pandemic period on CVD risk factors are also important next steps. Stability analysis of model performance and feature importance, for instance, through bootstrap simulations as suggested by Huang et al. [33], would add another layer of robustness to the findings.
Conclusion
Using a nationally representative NHANES sample spanning pre- and post-pandemic cycles, we applied explainable machine learning to a detailed profile of dietary and non-dietary variables and identified factors associated with prevalent CVD among U.S. adults. Model explanations (e.g., LIME and SHAP) and probability calibration analyses provide transparency about how predictors contribute to risk estimates within this dataset. These results are hypothesis-generating and may help prioritize variables and subgroups for further study rather than support individualized clinical decision-making at this stage.
It is important to note some limitations. First, CVD status was self-reported, which may introduce recall or classification error. Second, the cross-sectional design precludes causal inference; our findings should not be interpreted as evidence that modifying specific dietary components will reduce CVD risk. Third, 24-hour dietary recalls may not reflect long-term intake, and the complete-case approach for variables with substantial missingness reduced the analytic sample and may introduce selection bias. Finally, while model interpretability tools aid understanding, model complexity and potential dataset shift limit generalizability. External and temporal validation, incorporation of medication and additional clinical biomarkers, and prospective evaluation of calibrated thresholds are important next steps before considering clinical application [33].
Supporting information
S1 File. Supplementary information.
Calibration curves before and after post-hoc calibration (S1 Fig); LIME explanations for the undersampled model (S2 Fig); and test-set discrimination and calibration metrics (S1 Table).
https://doi.org/10.1371/journal.pone.0335915.s001
(PDF)
References
- 1. Mao Y, Jimma BL, Mihretie TB. Machine learning algorithms for heart disease diagnosis: a systematic review. Current Problems in Cardiology. 2025;50(8):103082.
- 2. Klados GA, Politof K, Bei ES, Moirogiorgou K, Anousakis-Vlachochristou N, Matsopoulos GK, et al. Machine learning model for predicting CVD risk on NHANES data. Annu Int Conf IEEE Eng Med Biol Soc. 2021;2021:1749–52. pmid:34891625
- 3. Istanbuly S, Matetic A, Roberts DJ, Myint PK, Alraies MC, Van Spall HG, et al. Relation of extracardiac vascular disease and outcomes in patients with diabetes (1.1 million) hospitalized for acute myocardial infarction. Am J Cardiol. 2022;175:8–18. pmid:35550818
- 4. Dinh A, Miertschin S, Young A, Mohanty SD. A data-driven approach to predicting diabetes and cardiovascular disease with machine learning. BMC Med Inform Decis Mak. 2019;19(1):211. pmid:31694707
- 5. Huang AA, Huang SY. Use of feature importance statistics to accurately predict asthma attacks using machine learning: a cross-sectional cohort study of the US population. PLoS One. 2023;18(11):e0288903. pmid:37992024
- 6. Huang AA, Huang SY. Use of machine learning to identify risk factors for insomnia. PLoS One. 2023;18(4):e0282622. pmid:37043435
- 7. Lang X, Li Y, Zhang D, Zhang Y, Wu N, Zhang Y. FT3/FT4 ratio is correlated with all-cause mortality, cardiovascular mortality and cardiovascular disease risk: NHANES 2007 -2012. Front Endocrinol (Lausanne). 2022;13:964822. pmid:36060933
- 8. Agarwal S, Zaman T, Tuzcu EM, Kapadia SR. Heavy metals and cardiovascular disease: results from the National Health and Nutrition Examination Survey (NHANES) 1999-2006. Angiology. 2011;62(5):422–9. pmid:21421632
- 9. Saydah S, Bullard KM, Cheng Y, Ali MK, Gregg EW, Geiss L, et al. Trends in cardiovascular disease risk factors by obesity level in adults in the United States, NHANES 1999 -2010. Obesity (Silver Spring). 2014;22(8):1888–95. pmid:24733690
- 10. Ruiz-García A, Pallarés-Carratalá V, Turégano-Yedro M, Torres F, Sapena V, Martin-Gorgojo A, et al. Vitamin D supplementation and its impact on mortality and cardiovascular outcomes: systematic review and meta-analysis of 80 randomized clinical trials. Nutrients. 2023;15(8):1810. pmid:37111028
- 11. Wang M, Tang R, Zhou R, Qian Y, Di D. The protective effect of serum carotenoids on cardiovascular disease: a cross-sectional study from the general US adult population. Front Nutr. 2023;10:1154239. pmid:37502714
- 12. Zulli A, Smith RM, Kubatka P, Novak J, Uehara Y, Loftus H, et al. Caffeine and cardiovascular diseases: critical review of current research. Eur J Nutr. 2016;55(4):1331–43. pmid:26932503
- 13. Chen H, Luo X, Du Y, He C, Lu Y, Shi Z, et al. Association between chronic obstructive pulmonary disease, cardiovascular disease in adults aged 40 years and above: data from NHANES 2013 -2018. BMC Pulm Med. 2023;23(1):318. pmid:37653498
- 14. Lopez-Neyman SM, Davis K, Zohoori N, Broughton KS, Moore CE, Miketinas D. Racial disparities, prevalence of cardiovascular disease risk factors, cardiometabolic risk factors and cardiovascular health metrics among US adults: NHANES 2011 -2018. Sci Rep. 2022;12(1):19475. pmid:36376533
- 15. Ahluwalia N, Dwyer J, Terry A, Moshfegh A, Johnson C. Update on NHANES dietary data: focus on collection, release, analytical considerations, and uses to inform public policy. Adv Nutr. 2016;7(1):121–34. pmid:26773020
- 16.
Centers for Disease Control and Prevention. National Health and Nutrition Examination Survey (NHANES), 2017–2020. U.S. Department of Health and Human Services, Centers for Disease Control and Prevention. 2020. https://www.cdc.gov/nchs/nhanes/index.htm
- 17.
Centers for Disease Control and Prevention. Indicator definitions – cardiovascular disease. 2024. https://www.cdc.gov/cdi/indicator-definitions/cardiovascular-disease.html
- 18. Buuren S van, Groothuis-Oudshoorn K. mice: Multivariate Imputation by Chained Equations inR. J Stat Soft. 2011;45(3).
- 19. Lunardon N, Menardi G, Torelli N. ROSE: a package for binary imbalanced learning. The R Journal. 2014;6(1):79.
- 20. Cox DR. The regression analysis of binary sequences. J R Stat Soc Series B. 1958;20(2):215–32.
- 21.
Tin Kam Ho. Random decision forests. In: Proceedings of 3rd International Conference on Document Analysis and Recognition. p. 278–82. https://doi.org/10.1109/icdar.1995.598994
- 22. Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20(3):273–97.
- 23.
Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, CA; 2016. p. 785–94.
- 24.
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. LightGBM: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems. 2017. p. 3146–54. https://papers.nips.cc/paper_files/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html
- 25.
Ribeiro MT, Singh S, Guestrin C. Why should i trust you?: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, 2016. p. 1135–44.
- 26.
Lundberg SM, Lee SI. A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems. 2017. p. 4765–74. https://papers.nips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html
- 27. Krittanawong C, Virk HUH, Bangalore S, Wang Z, Johnson KW, Pinotti R, et al. Machine learning prediction in cardiovascular diseases: a meta-analysis. Sci Rep. 2020;10(1):16057. pmid:32994452
- 28. Azmi J, Arif M, Nafis MT, Alam MA, Tanweer S, Wang G. A systematic review on machine learning approaches for cardiovascular disease prediction using medical big data. Med Eng Phys. 2022;105:103825. pmid:35781385
- 29. Naser MA, Majeed AA, Alsabah M, Al-Shaikhli TR, Kaky KM. A review of machine learning’s role in cardiovascular disease prediction: recent advances and future challenges. Algorithms. 2024;17(2):78.
- 30. Ogunpola A, Saeed F, Basurra S, Albarrak AM, Qasem SN. Machine learning-based predictive models for detection of cardiovascular diseases. Diagnostics (Basel). 2024;14(2):144. pmid:38248021
- 31. Subramani S, Varshney N, Anand MV, Soudagar MEM, Al-Keridis LA, Upadhyay TK, et al. Cardiovascular diseases prediction by machine learning incorporation with deep learning. Front Med (Lausanne). 2023;10:1150933. pmid:37138750
- 32. Yang R, Zhu M, Fan S, Zhang J. Niacin intake and mortality (total and cardiovascular disease) in patients with cardiovascular disease: insights from NHANES 2003-2018. Nutr J. 2024;23(1):123. pmid:39415265
- 33. Huang AA, Huang SY. Increasing transparency in machine learning through bootstrap simulation and shapely additive explanations. PLoS One. 2023;18(2):e0281922. pmid:36821544