Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Machine learning-based predictive modeling of angina pectoris in an elderly community-dwelling population: Results from the PoCOsteo study

  • Shahrokh Mousavi,

    Roles Conceptualization, Data curation, Formal analysis, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Biostatistics and Epidemiology, Faculty of Health and Nutrition, Bushehr University of Medical Sciences, Bushehr, Iran

  • Zahrasadat Jalalian,

    Roles Conceptualization, Project administration, Supervision, Validation, Writing – review & editing

    Affiliation School of Medicine, Bushehr University of Medical Sciences, Bushehr, Iran

  • Sima Afrashteh ,

    Roles Conceptualization, Data curation, Formal analysis, Methodology, Project administration, Supervision, Validation, Writing – original draft, Writing – review & editing

    sima.afrashte3@gmail.com (SA); Ak.farhadi@gmail.com (AF)

    Affiliation Department of Biostatistics and Epidemiology, Faculty of Health and Nutrition, Bushehr University of Medical Sciences, Bushehr, Iran

  • Akram Farhadi ,

    Roles Conceptualization, Data curation, Resources, Supervision, Writing – review & editing

    sima.afrashte3@gmail.com (SA); Ak.farhadi@gmail.com (AF)

    Affiliation The Persian Gulf Tropical Medicine Research Center, The Persian Gulf Biomedical Sciences Research Institute, Bushehr University of Medical Sciences, Bushehr, Iran

  • Iraj Nabipour,

    Roles Data curation, Funding acquisition, Investigation, Methodology, Resources, Writing – review & editing

    Affiliation The Persian Gulf Marine Biotechnology Research Center, The Persian Gulf Biomedical Sciences Research Institute, Bushehr University of Medical Sciences, Bushehr, Iran

  • Bagher Larijani

    Roles Data curation, Funding acquisition, Methodology, Resources, Writing – review & editing

    Affiliation Endocrinology and Metabolism Research Center, Endocrinology and Metabolism Clinical Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran

Abstract

Background

Angina pectoris, a comparatively common complaint among older adults, is a critical warning sign of underlying coronary heart disease. We aimed to develop machine learning-based models using multiple algorithms to predict and identify the predictors of angina pectoris in an elderly community-dwelling population.

Methods

Medical records of 2000 participants in the PoCOsteo study between 2018 and 2021 were analyzed. The Rose Angina Questionnaire was used to indicate angina pectoris. Preprocessing was performed using imputation and scaling methods. We developed the following models: logistic regression (LR), multilayer perceptron (MLP), support vector machine (SVM), k-nearest neighbors (KNN), linear and quadratic discriminant analysis (LDA, QDA), decision tree (DT), and two ensemble models: random forest (RF) and adaptive boosting (AdaBoost). To address model complexity and parameter uncertainty, we performed hyperparameter tuning and compared the trade-offs between model performance and interpretability, in addition to applying ten-fold cross-validation. To determine the importance of each feature as a measure of their contribution to the models’ performance, we conducted the permutation feature importance technique.

Results

With a mean age of 62.15 years (± 8.07) and 57.1% being female, 88.4% of the participants did not have angina, 3.6% had probable angina, and 8% had definite angina. The bivariate analysis revealed significant correlations between RAQ and several other variables. LDA, RF, and LR had the highest AUC values, averaging 0.772, 0.770, and 0.764, respectively. These three models outperformed QDA (AUC 0.752), SVM (0.733), AdaBoost (0.726), KNN (0.697), MLP (0.697), and DT (0.644). Permutation feature importance revealed a handful of features that implicated the role of thrombotic vascular diseases, congestive heart failure, renal failure, and anemia.

Discussion

Our study demonstrated that LDA, RF, and LR not only provided strong predictive performance but also balanced model complexity with interpretability. The superior performance of these models could be largely attributed to their ability to capture the relevant linear, nonlinear, and interaction effects inherent in the clinical data, as well as the clinical relevance of key predictors like thrombotic vascular diseases, congestive heart failure, renal failure, and anemia. Future studies could incorporate more direct diagnostic methods to test our findings further and enhance the robustness of the predictive models developed.

Introduction

Coronary heart disease (CHD), a subset of cardiovascular diseases, is characterized by the narrowing or blockage of coronary arteries due to the accumulation of atherosclerotic plaques [1]. CHD typically develops over time as a result of chronic inflammatory responses, endothelial dysfunction, and the formation of lipid-rich plaques within the arterial walls [1]. Narrowing of the coronary arteries can lead to disruptions in the blood supply to the myocardium, leading to imbalances between tissue oxygen demand and the supplied oxygen and causing ischemic heart disease (IHD) [2].

According to the World Health Organization, IHD is the leading cause of death worldwide [3], with close to 126 million cases in 2020 (1,655 per 100,000) and nine million deaths globally, expected to grow larger in the future [2]. The burden of CHD extends beyond mortality, with millions of individuals living with the disease experiencing reduced quality of life, increased healthcare utilization, and economic repercussions [4,5].

Angina pectoris is a symptom of IHD characterized by chest pain or discomfort resulting from insufficient blood flow to the heart muscle. This condition manifests as a sensation of pressure, squeezing, or heaviness in the chest, which may also radiate to the shoulders, arms, neck, or jaw [6]. While angina itself is not a heart attack, it serves as a critical warning sign of underlying CHD [7]. Angina is especially a common complaint among older adults [8]. This condition not only serves as a warning sign of underlying heart disease in this vulnerable population but also poses unique challenges in diagnosis and management, as symptoms may be atypical or masked by other age-related health issues [9].

There have been great advancements in Machine learning (ML) in many fields in recent years. For example, recent research in urban traffic forecasting has exploited multi-graph neural networks and spatio-temporal attention mechanisms to dynamically model complex interactions in traffic networks, yielding significant improvements in prediction accuracy and resource allocation efficiency [10]. In the realm of power systems, adaptive fuzzy backstepping and fractional-order control strategies have been developed to tackle challenges posed by cyber–physical security threats, such as intermittent denial-of-service attacks, thereby enhancing system stability and reducing overshoots and convergence times [11]. Similarly, breakthroughs in Internet of Things (IoT) applications have employed entropy-based multi-criteria decision-making techniques and energy-efficient wireless sensor networks to optimize e-commerce operations, addressing issues from data overload to customer personalization [12]. Furthermore, innovative fuzzy adaptive control methods have been applied for consensus tracking in incommensurate fractional-order systems, particularly in multiagent power systems, demonstrating both robustness and simplified controller designs [13]. Although studies span diverse application areas, they underscore a common theme: by leveraging advanced modeling techniques and adaptive algorithms, significant gains in prediction accuracy, efficiency, and robustness can be achieved.

ML has also emerged as a powerful tool in healthcare, particularly for its ability to analyze complex datasets and uncover patterns that may not be readily apparent through traditional statistical methods. Nonetheless, there are still barriers to the widespread adoption of these models in clinical circumstances [14]. In recent years, ML techniques have been increasingly applied to predict or identify the predictors of angina pectoris in different populations [1417].

Using different machine learning methods in this context offers the promise of transforming traditional risk assessment and prognostication in cardiovascular health. Unlike conventional analytical techniques that often assume linearity or require strict parametric conditions, ML algorithms are capable of capturing complex, multifactorial interactions inherent in clinical data. This flexibility facilitates the integrative analysis of diverse data types, enhancing the potential to discover novel relationships that drive angina pectoris. Moreover, by applying techniques such as permutation feature importance, our approach is not only able to identify key predictors with high clinical relevance but also to highlight potential targets for early intervention.

By leveraging large volumes of clinical data, including demographic information, medical history, and various physiological measurements, ML algorithms can potentially model the intricate relationships between multiple risk factors and the likelihood of developing angina. This study aimed to develop ML-based models for predicting angina pectoris in an elderly community-dwelling population surveyed through the PoCOsteo study using multiple algorithms. This study also sought to identify the most contributing factors in predicting angina pectoris via measuring each variable’s impact on the models’ performance through the permutation feature importance technique.

Methods

Research design and population

This study employed a cross-sectional design to analyze the data collected during the PoCOsteo study, a prospective cohort conducted in Austria (Graz Medical University) and Iran (Tehran University of Medical Sciences) [18]. The PoCOsteo study was an extension to phase II of the Bushehr Elderly Health (BEH) cohort [19]. The BEH cohort, conducted jointly by the Tehran University of Medical Sciences (Endocrinology and Metabolism Research Institute) and Bushehr University of Medical Sciences (Persian Gulf Marine Biotechnology Research Centre), aimed to investigate the prevalence of non-communicable diseases and their risk factors in the elderly. The PoCOsteo study, started in 2018, collected comprehensive baseline data on various health metrics and risk factors, with ongoing assessments and follow-ups for 12 months. The data analyzed in this study has been extracted from only one timepoint of the mentioned larger prospective cohort investigation. Therefore, the current study is designed as a cross-sectional investigation. The sample size included n = 2000 participants with a mean age of 62.15 years (± 8.07), and 57.1% females. We accessed the data on May 22, 2024.

The Iranian part of this cohort included community-dwelling men and women aged 50 years and older residing in Bushehr, Iran, who had not planned to leave this city for at least five years after enrollment. Upon enrollment, the subjects (or an accompanying individual, typically a relative) provided the informed consent. The cohort excluded older individuals residing in care facilities or those not physically or mentally capable of participating in the study. The participants were included through a multistage cluster random sampling method from 75 municipal blocks.

Following the granting of informed written consent, all individuals were interviewed and examined by qualified nurses to collect information on demographic status, lifestyle factors, general health, medical history, medication use, and mental and functional health. The enrollment process has previously been presented in more detail [19,20].

Data collection

This study employed a relatively large set of characteristics from the study population, including data on demographics, family history, past medical history, medication, potentially related symptoms, blood pressure measures, physical examination, and laboratory assessment. The authors did not have access to any information that would reveal the identities of individual participants during or following the data collection process. Family history has been focused on diabetes, hypertension, stroke, and myocardial infarction. Medical history included questions regarding cerebrovascular accidents, transient ischemic attacks, other neurologic conditions, heart failure, previous cardiac infarction, renal failure, malignancies, hypo/hyperthyroidism, hepatic disorders, pulmonary disorders, and the history of hospital admission in the preceding year.

This study has used the Rose Angina Questionnaire (RAQ) to indicate the presence of angina. This questionnaire is a standardized tool designed to assess the presence and severity of angina pectoris and myocardial infarction and is widely used in epidemiologic studies [21,22]. It consists of a series of questions that evaluate the frequency, duration, and intensity of chest pain episodes, as well as the circumstances under which they occur, such as physical exertion or emotional stress [23]. This questionnaire defines angina as chest pain that restricts physical activity, located over the sternum or in the left arm and chest, and which resolves after ten minutes of rest [23]. This questionnaire has been translated into Persian and approved as a reliable [24] and valid [25,26] tool in the Iranian population. To train the ML models, we dichotomized the participants using this questionnaire into either the “Angina” group (definite angina) or the “No Angina” group (probable or no angina) [22]. Patients with all the following characteristics were classified as “Angina” (definite angina):

  1. I. Chest pain when walking at an ordinary pace on the level or uphill or when hurrying
  2. II. Stopped walking or slowed down due to chest pain
  3. III. Chest pain is relieved in less than 10 minutes

The participants’ blood pressure (BP), including systolic and diastolic BP (SBP and DBP, respectively), were measured using a standard mercury sphygmomanometer after a 15-minute rest while seated. BP was measured for each patient twice on the right arm with a ten-minute interval. The average of the two readings was recorded as either SBP or DBP. Pulse pressure was calculated as the difference between SBP and DBP (PP = SBP – DBP) and proportional pulse pressure as PP divided by SBP [27].

For biochemical measurements, a qualified nurse collected 25 milliliters of venous blood from each subject after fasting for 8–10 hours. The laboratory work-up included complete blood count (white blood cell count, red blood cell count, hemoglobin, mean corpuscular volume, hematocrit, platelet count), serum levels of fasting blood sugar, hemoglobin A1c, lipid profile (total cholesterol, high-density lipoprotein, low-density lipoprotein, triglyceride), creatinine, urea, parathyroid hormone, and 25-OH vitamin D.

To measure the anthropometric indices, participants put on light clothing and had their shoes removed. Using a digital scale and a fixed stadiometer, height (in centimeters) and weight (in kilograms) were measured by the standard procedure. Hereby, the body mass index was calculated as equal to body weight (kilograms) divided by height (meters) squared. Additionally, right leg and arm circumference and length were measured along with neck and hip circumference (all in centimeters) using a flexible non-stretching tape measure. Body mass composition was also measured using a dual x-ray absorptiometry (DXA, Discovery WI, Hologic, Bedford, Virginia, USA), from which we have utilized whole body fat mass, lean mass, and total mass [28].

Handgrip strength was measured using a standardized digital dynamometer. Participants were seated comfortably with the shoulder adducted and neutrally rotated, the elbow flexed at a 90° angle, and the forearm and wrist in a neutral position. Each participant was instructed to squeeze maximally for three seconds during each trial, with standardized verbal encouragement provided across all measurements. Three trials were conducted for each hand, with a resting period of at least 60 seconds between attempts to avoid muscle fatigue. The highest value from the trials was recorded as the participant’s maximum grip strength. Previous studies have demonstrated grip strength assessment to be a valid and reliable procedure among healthy and various clinical populations [29].

Collecting the data described above, we intentionally chose to forgo the use of automated methods for feature selection, a decision reached after consultations with experts in the field. We assert that while ML algorithms can offer automated selection techniques, they may not capture the full complexity and nuances of the data at hand.

Descriptive statistics

Upon collection, the data were delineated using descriptive statistics stratified by the RAQ result (no angina, probable angina, and definite angina). Then, basic analytical statistics were used to compare the outcomes of RAQ in each variable using the Kruskal-Wallis test for the quantitative variables and Chi-square for the qualitative variables. These steps were carried out in Statistical Package for the Social Sciences (SPSS) v22. P-values less than 0.05 were considered statistically significant.

Preprocessing

In order to feed the data to the ML models, they were first cleaned and prepared as follows. All the steps henceforth were carried out in Python programming language v3.11.8. Fig 1 demonstrates the major steps taken throughout the study.

Given the sensitivity of the data format for the employed ML models, data cleaning was an essential process. To address the missing data in the dataset an imputation method was used. When presenting data to an ML model for training or testing, the missing values for some features may disrupt the model’s performance. Here, two approaches can be adopted. One is to completely exclude the data rows that do not have all features available, which would result in a significant portion of the data being wasted in this way. The other approach is to fill in the missing values using the imputation method, which is an approximate method [30]. Implementing the second approach, we aimed for a method that balances simplicity and effectiveness. More complex methods such as KNN imputation or multiple imputation techniques, while potentially more accurate, also have certain drawbacks, e.g., sensitivity to the choice of distance metrics and the number of neighbors or demanding more complex implementation and computation [31,32]. Ultimately, the function used in the present analysis was SimpleImputer from the scikit-learn library [33], due to its straightforward implementation, computational efficiency, and the adequacy of its performance for our specific dataset. This choice strikes a balance between maintaining data integrity and ensuring the reliability of our analysis, while maintaining the overall distribution of the data and minimizing the introduction of bias that could arise from more complex imputation methods. This technique performs imputation using descriptive statistics, replacing the missing values with average of each column (strategy = ’mean’) for quantitative variables and most frequent values for the qualitative variables (strategy = ’most_frequent’).

Next, due to the different scales of the features, it was necessary to normalize the data to prevent the distortion of model performance and to make the training process shorter and more efficient. The StandardScaler function was applied to the predictor variables, accessible from the preprocessing module in the scikit-learn library [33]. This function uses the following formula for normalization:

Where Z is the normalized value, x is the original value, u is the arithmetic mean of the feature, and s is the standard deviation. The training and test data were normalized separately to prevent information leakage from the test data to the training data.

Finally, all the qualitative variables were encoded as categorical variables using the function “to_categorical” from the utils module of the Keras library.

Machine learning models

Different algorithms have varying strengths and weaknesses, and their performance can significantly depend on the nature of the data and the specific problem being addressed [34]. By exploring a diverse set of algorithms, we can identify which model best captures the underlying patterns in the data, leading to improved accuracy and generalization. Additionally, using multiple models allows for a more robust evaluation of the results, as it helps to mitigate the risk of overfitting a particular algorithm. This approach also facilitates the comparison of model performance metrics. Hereby, we have developed models of logistic regression (LR), artificial neural network (ANN), support vector machine (SVM), k-nearest neighbors (KNN), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), decision tree (DT), random forest (RF), and adaptive boosting (AdaBoost).

  • Logistic Regression: LR applies the logistic function to a linear combination of the input features, transforming the output into a value between 0 and 1, which can be interpreted as a probability. The model estimates the relationship between the independent variables and the dependent binary outcome, allowing for the identification of significant predictors [35]. Logistic regression is favored for its simplicity, interpretability, and efficiency, making it a popular choice in various fields. Hyperparameter tuning for this model was performed through randomized grid search optimizing for penalty, solver, C, max_iter, and l1_ratio.
  • Artificial neural network: This model was designed as a Multi-Layer Perceptron (MLP). MLPs are widely used in various applications, including classification, regression, and pattern recognition tasks, making them a foundational model in deep learning [36,37]. Hyperparameter tuning for MLP was performed through randomized search using keras-tuner engine, testing the performance (AUC) of models with different number of layers and units in each layer.
  • Support Vector Machine: A potent supervised ML technique, SVM is mainly utilized for classification, while it can also be used for regression issues. The core idea behind SVM is to find the optimal hyperplane that best separates data points of different classes in a high-dimensional space. We have tuned, trained, and tested this model using the scikit-learn library [33]. Hyperparameter tuning for this model was performed through randomized grid search optimizing for C, gamma, kernel, degree, and coef0.
  • Linear and Quadratic Discriminant Analysis: By maximizing the ratio of between-class variance to within-class variance, LDA identifies the optimal projection that enhances class separability. This makes LDA particularly effective for problems where the classes are linearly separable [34]. QDA, on the other hand, is a classification technique that extends LDA by allowing each class to have its own covariance matrix. This means that QDA can model more complex decision boundaries, as it does not assume that the classes share the same variance structure. Like LDA, QDA also relies on the assumption that the data follows a Gaussian distribution, but it uses quadratic functions to define the decision boundaries. This flexibility allows QDA to perform better than LDA in situations where the classes are not linearly separable, making it suitable for more complex datasets [34]. Tuned hyperparameters were solver and shrinkage for LDA, and reg_param and tol for QDA.
  • Decision Tree: A decision tree aims to create a model that predicts the target variable by learning simple decision rules inferred from the data. However, they can be prone to overfitting, especially when the tree is deep and complex [38]. To overcome this challenge, we have used two ensemble models that utilize decision trees: random forest and AdaBoost. Randomized grid search was performed to tune decision tree hyperparameters criterion, splitter, max_depth, min_samples_split, min_samples_leaf, max_features, and class_weight.
  • Random forest: It is an ensemble learning method that builds upon the concept of decision trees to improve predictive accuracy and control overfitting. It constructs a multitude of decision trees during training and outputs the mode of their predictions (for classification) or the average (for regression). Each tree in a random forest is trained on a random subset of the data and a random subset of features, which introduces diversity among the trees and helps to reduce variance. This ensemble approach enhances the model’s robustness and generalization capabilities, making random forests highly effective for a broad variety of applications [39]. Hyperparameter tuning for this model was performed through randomized grid search optimizing for n_estimators, max_features, max_depth, min_samples_split, min_samples_leaf, and bootstrap.
  • Adaptive Boosting: AdaBoost is another ensemble learning technique that combines multiple weak classifiers to create a strong classifier. When used with decision trees, AdaBoost typically employs shallow trees, often referred to as “stumps,” as the base learners. AdaBoost is particularly effective when the data is noisy, or the model needs to adapt to complex patterns [40]. AdaBoost was tuned for hyperparameters n_estimators, learning_rate, estimator__max_depth, estimator__min_samples_split, and estimator__min_samples_leaf.
  • K-Nearest Neighbors: The core idea behind KNN is to make predictions based on the ‘k’ closest data points in the feature space to a given input instance. One of the key advantages of KNN is its non-parametric nature, meaning it makes no assumptions about the underlying data distribution [41]. KNN was tuned for hyperparameters n_neighbors, weights, and metric.

All models were developed using the scikit-learn library v 1.5.0 [33] except ANN, which was developed by the Keras library v 3.3.3 [36]. Similarly, the hyperparameters were tuned for all these models via the function RandomizedSearchCV from the scikit-learn library and the keras-tuner library v1.4.7 for ANN, which is integrated with both Keras and scikit-learn libraries [42]. While scikit-learn is chiefly dedicated to traditional ML algorithms, Keras provides several advantages over it regarding the development of ANNs, of which flexibility, customization, scalability, support for production deployment, and integration with deep learning frameworks are the most prominent [36]. Table 1 Presents the optimized settings for the hyperparameters used in developing each model.

thumbnail
Table 1. The hyperparameters for each model based on the random hyperparameter search.

https://doi.org/10.1371/journal.pone.0329023.t001

Model development and evaluation

All models have been trained and evaluated through the commonly used method of 10-fold cross-validation [43]. It is a robust statistical method used to evaluate the performance of ML models and ensure their generalizability to unseen data. This technique randomly divided the dataset into ten equal-sized subsets, or “folds.” The model is then trained and validated on nine of these folds, while the remaining fold is used as the test set to assess the model’s performance. Therefore, the data were divided to train (64%), validation (16%), and test (20%) sets. Early stopping method has been used while monitoring validation set loss to prevent overfitting. This procedure is carried out ten times, with each fold being used as the test set one time. The final performance metric is obtained by averaging the results from all ten iterations. This approach helps to mitigate the risk of overfitting, as it provides a more comprehensive evaluation of the model’s ability to generalize across different subsets of the data [43].

To provide a basis for comparing the models’ performance, the following metrics were calculated for each model in each fold, and the average values were reported: area under the curve (AUC), Youden index, accuracy, sensitivity, and specificity. AUC is derived from the receiver operating characteristic (ROC) curve and quantifies the model’s ability to discriminate between positive and negative classes across various threshold settings. The Youden index, calculated as the sum of sensitivity and specificity minus one, provides a single measure that captures the model’s overall effectiveness in distinguishing between classes, with higher values indicating better performance [44]. We have measured this index at the J point, the optimal threshold for classification. We have also measured the models’ accuracy (the proportion of true positive and true negative predictions among the total predictions), sensitivity (the model’s ability to correctly identify positive instances), and specificity (the model’s ability to correctly identify negative instances).

To determine each feature’s importance as a measure of its contribution to the models’ performance, we conducted the permutation feature importance technique. Through this technique, the values of a specific feature are randomly shuffled while measuring the change in a model’s performance metric (AUC in our study), thereby assessing the contribution of that feature to the model’s predictive power. We then ranked features based on their importance for each model. Using the results from top three models, the numeric values of the rank of each feature were summed across the three models, and eventually, the features were sorted from the smallest summed indices (most important feature) to the largest summed indices (least important feature), where the results were tabulated and presented accordingly. This technique has been adopted previously [45,46].

Ethics statement

This study is carried out in accordance with the Declaration of Helsinki and the relevant guidelines. Due to the retrospective nature of the study (with the approval ID of IR.BPUMS.REC.1403.024), the Research Ethics Committees of Bushehr University of Medical Sciences waived the need for informed consent.

Results

This study includes n = 2000 participants who had filled out the RAQ. The mean age of the participants was 62.15 years (± 8.07), and most (57.1%) were female. The average duration of education was 8 (± 5) years. According to the RAQ, most participants (88.4%) did not have angina, and only 3.6% had probable angina, while 8% had definite angina. The bivariate analysis revealed significant correlations between RAQ and several other variables, including gender, education, DBP, most current related symptoms, most physical examination variables, some variables in past medical history and medication, and a few laboratory assessments (Table 2).

thumbnail
Table 2. Characteristics of the participants, Rose Angina Questionnaire completed, n = 2000.

https://doi.org/10.1371/journal.pone.0329023.t002

Models’ performance

The performance metrics of the ML algorithms were evaluated through a 10-fold cross-validation process (Table 3). Among the algorithms assessed, LDA achieved the highest mean AUC of 77.2% (95% CI: 72.7% − 81.7%), indicating robust performance, followed by RF (mean AUC of 77.0%, 95% CI: 74.3% − 79.7%) and LR (mean AUC of 76.4%, 95% CI: 71.3% − 81.5%). All models presented acceptable discrimination, except for KNN, DT, and ANN, where poor discrimination was achieved (below 70%). Regarding accuracy, LR outperformed other models with a mean accuracy of 73.9% (± 8.5%). Sensitivity was notably highest in the SVM model at 79.2% (± 18.0%), while QDA exhibited the best specificity at 77.3% (± 12.0%). Conversely, the DT algorithm demonstrated the lowest performance across all metrics, particularly in AUC (64.4% ± 9.5%) and Youden’s index (0.324 ± 0.133). These results highlight the varying effectiveness of different algorithms in predictive modeling. Based on the objectives of this study, we have chosen the top three best models (highest AUCs) to analyze further and interpret: LDA, RF, and LR (Fig 2).

thumbnail
Table 3. Performance metrices of the trained models across cross-validation folds (n = 10).

https://doi.org/10.1371/journal.pone.0329023.t003

thumbnail
Fig 2. Receiver operating characteristic curve for the models with the highest mean area under curve (AUC) across ten folds of cross validation.

https://doi.org/10.1371/journal.pone.0329023.g002

Feature importance

Table 4 highlights the ranking and description of each feature, emphasizing their significance across the models. Notably, “Uneven heartbeat” consistently ranked first in all three models, indicating its critical role in predicting outcomes related to the studied condition. Other prominent features included “Normal activity dyspnea” and “Congestive heart failure,” underscoring their relevance in the predictive framework. “Paresthesia” showed variability in importance, ranking second in Random Forest, third in Logistic Regression, and ninth in Linear Discriminant Analysis, suggesting its varying impact depending on the model used. Additionally, features such as “Hospital admissions” and “Any dyspnea” maintained high rankings, further supporting their importance in clinical assessments. The inclusion of laboratory metrics like “Red blood cell count” and “Creatinine” also reflected the models’ reliance on both clinical symptoms and laboratory values. Overall, the consistent ranking of these features across all three models highlighted their potential as key indicators in the predictive analysis, providing useful insights for clinical decision-making and further research.

thumbnail
Table 4. The intersection of the most contributing features ranked among the top 20 important features present in all the three best performing models.

https://doi.org/10.1371/journal.pone.0329023.t004

Discussion

The goal of this study was to tap into the capabilities of ML-based models in identifying potential predictive factors of angina pectoris in the PoCOsteo study. The evaluation of ML algorithms through a 10-fold cross-validation process revealed differences in performance metrics. Among the algorithms tested, LDA emerged as the most effective (highest AUC), underscoring its good predictive capability. Conversely, yielding the lowest AUC, the decision tree algorithm consistently underperformed across all metrics.

The primary distinction between LDA and QDA lies in their assumptions about the covariance of the classes. LDA assumes that all classes share the same covariance matrix, leading to linear decision boundaries, while QDA allows for different covariance matrices for each class, resulting in quadratic decision boundaries. Consequently, LDA is generally more efficient and requires fewer parameters, making it preferable for high-dimensional datasets with limited samples. In contrast, QDA can capture more complex relationships in the data at the cost of increased computational complexity and the risk of overfitting when the sample size is small [34].

Random Forest is an ensemble method that works well with structured data, especially when there are complex interactions and non-linear relationships. It ranked the second model in our study with close performance to the LDA model. Random forests, combining a multitude of mediocre decision tree models, are less prone to overfitting [47].

The high sensitivity of the SVM model indicates its effectiveness in correctly identifying positive instances. This observation is explained by the fact that SVMs are particularly powerful in high-dimensional spaces and can handle non-linear relationships through the use of kernel functions, which may explain their ability to capture more complex patterns in the data [48].

The three poorest models in this study were KNN, decision tree, and ANN. KNN is sensitive to the selection of the distance metric and the number of neighbors chosen. This can lead to overfitting or underfitting, especially in high-dimensional spaces [31]. Decision Trees, on the other hand, despite being able to capture complex patterns in the data, are also prone to overfitting, especially if they are allowed to grow deep without any constraints [38]. It should be noted that measures were taken in each model’s training process to prevent overfitting. In addition, decision trees can have high variance, meaning that small changes in the training data can lead to significantly different tree structures [49]. This can result in a model that performs well on the training set but poorly on the validation or test set. ANN models with multi-layer perceptron architecture can easily overfit the training data, particularly if the dataset is relatively small (2000 rows in our case) [50]. Numerous categorical variables also favor other algorithms, e.g., random forests. That being the case, an ensemble model such as a random forest, which integrates many decision trees, is less prone to overfitting due to its ensemble nature, as it averages the predictions of multiple decision trees [47].

Recently, ML techniques have been applied to predict or identify the predictors of angina pectoris in different populations at an increasing pace. For instance, Ahmad et al. developed a point-of-care tool for 1,893 patients with non-obstructive coronary artery disease, which achieved an AUC of 0.67 for predicting coronary microvascular dysfunction [51]. Another study by Kim et al. evaluated eight machine learning models to predict stable obstructive coronary artery disease in 1,312 patients, with the CatBoost algorithm achieving an AUC of 0.796, outperforming traditional coronary artery disease pre-test probability models [17]. In our study, we focused on a community-dwelling elderly population, analyzing medical records of 2,000 participants to predict angina pectoris. Our findings revealed that LDA, RF, and LR models achieved AUC values averaging 0.772, 0.770, and 0.764, respectively. The differences in AUC values across these studies may be attributed to several factors. Firstly, the populations studied vary significantly; Ahmad [51] focused on patients with non-obstructive coronary artery disease, while Kim [17] examined those with stable obstructive coronary artery disease. Our study specifically targeted an elderly demographic, which may present unique risk factors and clinical characteristics influencing the predictive performance of the models. Additionally, the features selected for model training and the data preprocessing techniques employed also differ, impacting the models’ ability to capture relevant patterns in the data. Lastly, the choice of machine learning algorithms and their respective tuning parameters can also lead to variations in performance.

Some other studies have tried to predict different outcomes and therefore have achieved different performances for the trained models. Wang et al. developed a risk score model for predicting composite cardiovascular events in 690 patients, with the Light Gradient Boosting Machine achieving an AUC of 0.95 [52]. Another study by Zhu et al. created cardiovascular disease risk prediction models using electronic medical records from 8,894 patients with chronic kidney disease, achieving an AUC of 0.89 with an Extreme Gradient Boosting model [16]. Wang and Zhu focused on composite cardiovascular events and broader disease risk, which involved more predictive features, resulting in higher AUCs. In contrast, our study on angina pectoris, a more specific condition, may limit the complexity of predictive relationships. Additionally, the populations differ in addition to different sample sizes, which can further influence the predictive performance of the models.

Assessment of the most contributing factors in our study revealed features from medical history, related symptoms, and laboratory assessment that consistently ranked among the top 20 most important features across the best-performing models. A notable predictor identified is the sensation of irregular heart rhythms, specifically tachycardia and arrhythmia. The consistent ranking of “uneven heartbeat” as the most significant feature across all three models underscores its critical role in identifying individuals at risk for angina pectoris. These symptoms suggest that patients experiencing uneven heartbeats may be at an elevated risk for cardiac events. This finding aligns well with the existing literature that recognizes arrhythmias as important indicators of cardiovascular events, suggesting that clinicians should prioritize monitoring this symptom in elderly patients [5356].

The history of neurological symptoms, such as hemiparesthesia, was also noted as a potential predictor. This symptom may indicate an underlying thrombotic vascular disease, e.g., thrombotic microangiopathy, that compromises the blood flow [57]. In patients predisposed to thrombosis, such as those with arrhythmias, stagnant blood flow can lead to thrombus formation and eventually cause ischemic events, including angina [58]. Thus, previous experience with hemiparesthesia may serve as a critical warning sign of potential thrombogenic susceptibility [59], prompting timely evaluation and intervention in case of other symptoms to reduce the risk of serious cardiac complications. Our finding regarding the significance of thrombogenicity in CHD aligns well with the existing literature, as it has been supported by previous studies [60]. This finding is further supported by the observations that show an elevated thrombogenicity during acute myocardial infarction compared to stable coronary artery diseases [61].

A number of other symptoms related to congestive heart failure appeared as important features. Dyspnea, particularly during normal activities, the use of medication for congestive heart failure, and nocturia (fluid overload) emerged as a critical feature predicting angina pectoris. This suggests that deteriorating cardiac function can be associated with the onset of angina. This association can also hold true in reverse, as coronary heart disease can, over time, result in congestive heart failure [62].

Furthermore, a history of hospital admissions in the previous year (lasting at least 24 hours) was found to be an important feature for the prediction of angina. Frequent hospitalizations may reflect severe underlying health issues, including cardiovascular complications, thereby highlighting the relevance of prior health crises in assessing future angina risk.

Laboratory findings, including red blood cell count and serum creatinine levels, further elucidate the risk profile for angina. Anemia or red blood cell count variations can impair oxygen delivery to tissues [63]. Anemia has long been recognized as a risk factor for cardiovascular diseases and their progression in previous studies [6467]. It is also reported to be an independent predictor of mortality and adverse outcomes especially among the elderly population with coronary artery disease [63].

Furthermore, elevated creatinine levels indicate renal impairment, frequently associated with cardiovascular disease [68]. Similar to this finding of our study, it has been observed that the presence of slightly reduced kidney function along with anemia is linked to a higher likelihood of coronary heart disease events and increased mortality [69]. Both factors may exacerbate existing heart conditions and increase the likelihood of angina.

Overall, the consistent ranking of the mentioned features across the models not only highlights their potential as key indicators in future research aimed at exploring the interplay of these factors in greater depth but also provides actionable insights for clinical decision-making. By integrating the identified predictive factors into clinical practice, healthcare providers can improve early detection of angina and other cardiac events, ultimately leading to better management strategies, reduced morbidity, and improved quality of life for patients. Additionally, these insights can inform public health initiatives aimed at educating both patients and healthcare professionals about the importance of recognizing and addressing early signs of coronary heart disease.

There were significant findings revealed through the univariate analysis, including the association between a number of underlying diseases and angina. Although univariate analysis is limited by its inability to account for confounding and interaction effects, some of its results are nevertheless discussed here. These findings were mostly congruent with the existing literature, such as those regarding hypertension [70], hyperthyroidism [71,72], hepatic disease [73,74], pulmonary disease [75], and depression [76,77]. The anthropometric indices (height, hip circumference, arm and leg lengths, neck circumference, body mass index, lean and fat mass) were also found to be associated with angina pectoris. These indices suggest a clinical picture where participants with central obesity are more susceptible to developing angina pectoris. This association has been extensively proven previously [7880]. Similar significant results were found regarding handgrip strength, where those with angina tended to have weaker handgrips. This finding is also coherent with previously published studies [8183], highlighting the role of physical strength and exercise capacity in coronary artery disease.

The univariate analysis also revealed significant differences among different classes of angina in relation to a number of variables (diabetes, dyslipidemia, and fasting blood sugar), indicating the significance of metabolic syndrome as a risk factor for cardiovascular diseases. These results echo broader findings in the literature where metabolic syndrome has been robustly linked with an elevated risk of coronary artery disease. For instance, a recent meta-analysis demonstrated that individuals with metabolic syndrome are at substantially higher risk of developing coronary artery disease (approximately four times) compared to those without metabolic syndrome [84]. This meta-analysis further underscored that even the individual components of metabolic syndrome are significantly associated with coronary artery disease risk.

Another valuable finding concerned obstructive sleep apnea (OSA) in relation to angina pectoris, suggesting a higher prevalence of OSA in participants with angina pectoris. Globally, OSA is a burdening challenge in the elderly, with a prevalence of approximately 35.9% in this population [85]. OSA has been increasingly recognized as a significant factor in the development and exacerbation of angina pectoris [86]. Machado found in 2024 [87] that a high BOAH score (Body mass index, Observed apnea, Age, and Hypertension), an indicator for obstructive sleep apnea risk, is significantly associated with a greater likelihood of angina – with increased snoring frequency, age, and shorter sleep durations further strengthening this relationship. Moreover, in a cohort of 2990 participants in 2023 from the Sleep Heart Health Study [88], suboptimal sleep efficiency – measured by baseline polysomnography – was associated with a higher risk of developing angina pectoris, particularly among hypertensive individuals, compared to those with optimal sleep efficiency. A cross‐sectional NHANES study (National Health and Nutrition Examination Survey) found in 2024 that individuals with probable OSA had significantly higher risks for cardiovascular events [89]. The mentioned studies have systematically targeted OSA and therefore differ from our study methodologically. Additionally, the populations also differ, as extremely high prevalence of OSA has been reported in Iran (44% for general population and 55% for people with cardiovascular diseases), surpassing other countries [90]. However, our results concur with that of these studies concerning the association between OSA and angina pectoris.

The intermittent hypoxia and sleep fragmentation caused by OSA can lead to heightened sympathetic nervous system activity, increased blood pressure, and oxidative stress, all of which contribute to endothelial dysfunction and systemic inflammation. These pathophysiological changes promote atherosclerosis and impair myocardial perfusion, thereby elevating the risk of ischemic events such as angina pectoris [91]. Furthermore, OSA-related cardiovascular stress may exacerbate existing coronary artery disease, resulting in a higher incidence and severity of angina symptoms in affected patients.

The strengths of this study lie in the utilization of multiple ML models, which allows for a comprehensive evaluation of predictive factors associated with angina pectoris. By employing cross-validation techniques, we ensured the generalizability of our findings, minimizing the risk of overfitting and enhancing the reliability of the model performance. Additionally, hyperparameter tuning was conducted to optimize the models, ensuring that we achieved the best possible predictive accuracy. Other strengths of this study include its utilization of a well-defined cohort from the PoCOsteo study, which provides a rich dataset on the medical condition of the elderly population. The analysis incorporates a relatively high number of variables, enabling us to capture complex interactions and relationships that may influence cardiovascular health.

To effectively integrate our predictive models into routine clinical workflows and decision-support systems, it is essential to consider the practical application of these tools in everyday clinical settings. The models developed in this study, particularly the LR, LDA, and RF models, demonstrate not only strong predictive performance but also a level of interpretability that is crucial for clinical adoption. Instances include integrating these models into electronic health record systems to enable real-time risk assessments during patient evaluations, allowing clinicians to identify at-risk individuals promptly and suggest further assessments/interventions accordingly. Additionally, training programs for healthcare providers on how to utilize these predictive tools effectively can enhance their confidence in employing data-driven approaches in clinical practice. Ultimately, the successful integration of these models into clinical workflows has the potential to improve patient outcomes by enabling earlier detection and management of angina pectoris, thereby addressing a critical warning sign of underlying coronary heart disease in older adults.

Limitations

The most significant limitation of our study, given its epidemiological design, was the lack of access to the gold standard for diagnosing coronary heart disease, which is coronary angiography. This limitation might affect the definitive classification of participants concerning their CHD status. However, it is important to note that we have substantial evidence supporting the validity of the Rose Angina Questionnaire as a reliable tool for predicting CHD in similar settings [9294]. The RAQ has been widely utilized in epidemiological studies, making it a suitable alternative for our analysis. Consequently, we adopted the RAQ as the primary outcome measure for our ML-based models. While this approach did not provide the same level of diagnostic certainty as coronary angiography, it allowed us to leverage existing data effectively and draw meaningful conclusions regarding the predictive factors associated with angina pectoris in our cohort. It is also important to highlight the retrospective nature of the study as another limitation, which might introduce biases inherent in the data collection process. These biases might affect the validity of the findings and their applicability to current clinical practices. Furthermore, the reliance on historical data restricted the generalizability of the results. As such, caution should be exercised when interpreting the outcomes, and further prospective studies are warranted to validate these findings and enhance their relevance in contemporary settings. Additionally, the lack of external validation raised concerns about the robustness and applicability of the results to broader populations or different settings. Another limitation of the employed methodology concerned the potential risk of overfitting, as while ensemble models were employed with cross-validation techniques to enhance predictive performance, there remained a potential risk of overfitting. This risk could compromise the models’ ability to generalize to unseen data, thereby affecting the reliability of the conclusions drawn from the analysis.

Conclusion

In conclusion, our study demonstrated that ML models – particularly LDA, RF, and LR – could effectively identify key predictors of angina pectoris in an elderly population. Notably, factors related to thrombotic vascular diseases, congestive heart failure, renal failure, and anemia emerged as critical, providing actionable insights for early intervention. These findings not only reinforced the clinical relevance of routinely monitoring symptoms such as arrhythmia and uneven heartbeat, but also supported the integration of ML-based risk stratification methods into clinical practice. By identifying high-risk patients, clinicians can plan preventive strategies, such as recommending further diagnostic evaluations including polysomnography for sleep-related risk factors or developing targeted intervention programs. Overall, our results have important implications for improving the early detection and management of coronary heart disease, thereby informing both clinical decision-making and public health initiatives aimed at reducing cardiovascular morbidity. Future studies could benefit from incorporating more direct diagnostic methods to further validate our findings and enhance the robustness of the predictive models developed.

References

  1. 1. Malakar AK, Choudhury D, Halder B, Paul P, Uddin A, Chakraborty S. A review on coronary artery disease, its risk factors, and therapeutics. J Cell Physiol. 2019;234(10):16812–23.
  2. 2. Khan MA, Hashim MJ, Mustafa H, Baniyas MY, Al Suwaidi S, Al Katheeri R. Global epidemiology of ischemic heart disease: results from the global burden of disease study. Cureus. 2020;12(7):e9349.
  3. 3. Nowbar AN, Gitto M, Howard JP, Francis DP, Al-Lamee R. Mortality from ischemic heart disease. Circ Cardiovasc Qual Outcomes. 2019;12(6):e005375. pmid:31163980
  4. 4. Barr DA. Geography as disparity. Circulation. 2016;133(12):1151–4.
  5. 5. Gillum RF, Mehari A, Curry B, Obisesan TO. Racial and geographic variation in coronary heart disease mortality trends. BMC Public Health. 2012;12:410. pmid:22672746
  6. 6. Fuchs RM, Becker LC. Pathogenesis of angina pectoris. Arch Intern Med. 1982;142(9):1685–92. pmid:7052007
  7. 7. Kloner RA, Chaitman B. Angina and its management. J Cardiovasc Pharmacol Ther. 2017;22(3):199–209.
  8. 8. Mittelmark MB, Psaty BM, Rautaharju PM, Fried LP, Borhani NO, Tracy RP, et al. Prevalence of cardiovascular diseases among older adults. The Cardiovascular Health Study. Am J Epidemiol. 1993;137(3):311–7. pmid:8452139
  9. 9. Nanna MG, Wang SY, Damluji AA. Management of stable angina in the older adult population. Circ Cardiovasc Interv. 2023;16(4):e012438. pmid:36916288
  10. 10. Ali A, Ullah I, Shabaz M, Sharafian A, Khan MA, Bai X. A resource-aware multi-graph neural network for urban traffic flow prediction in multi-access edge computing systems. IEEE Trans Consumer Electron. 2024;70(4):7252–65.
  11. 11. Sharafian A, Ullah I, Singh SK, Ali A, Khan H, Bai X. Adaptive fuzzy backstepping secure control for incommensurate fractional order cyber–physical power systems under intermittent denial of service attacks. Chaos Soliton Fract. 2024;186:115288.
  12. 12. Ullah I, Adhikari D, Ali F, Ali A, Khan H, Sharafian A. Revolutionizing E-commerce with consumer-driven energy-efficient WSNs: a multi-characteristics approach. IEEE Trans Consumer Electron. 2024;70(4):6871–82.
  13. 13. Sharafian A, Ali A, Ullah I, Khalifa TR, Bai X, Qiu L. Fuzzy adaptive control for consensus tracking in multiagent systems with incommensurate fractional-order dynamics: application to power systems. Inf Sci. 2025;689:121455.
  14. 14. Stewart J, Lu J, Goudie A, Bennamoun M, Sprivulis P, Sanfillipo F, et al. Applications of machine learning to undifferentiated chest pain in the emergency department: a systematic review. PLoS One. 2021;16(8):e0252612. pmid:34428208
  15. 15. Guldogan E, Yagin FH, Pinar A, Colak C, Kadry S, Kim J. A proposed tree-based explainable artificial intelligence approach for the prediction of angina pectoris. Sci Rep. 2023;13(1):22189. pmid:38092844
  16. 16. Zhu H, Qiao S, Zhao D, Wang K, Wang B, Niu Y, et al. Machine learning model for cardiovascular disease prediction in patients with chronic kidney disease. Front Endocrinol (Lausanne). 2024;15:1390729. pmid:38863928
  17. 17. Kim J, Lee SY, Cha BH, Lee W, Ryu J, Chung YH, et al. Machine learning models of clinically relevant biomarkers for the prediction of stable obstructive coronary artery disease. Front Cardiovasc Med. 2022;9:933803. pmid:35928935
  18. 18. Khashayar P, Dimai HP, Moradi N, Fahimfar N, Gharibzadeh S, Ostovar A, et al. Protocol for a multicentre, prospective cohort study of clinical, proteomic and genomic patterns associated with osteoporosis to develop a multidimensional fracture assessment tool: the PoCOsteo Study. BMJ Open. 2020;10(9):e035363. pmid:32998914
  19. 19. Gita S, Afshin O, Ramin H, Hossein D, Farshad S, Alireza R, et al. Bushehr Elderly Health (BEH) programme: study protocol and design of musculoskeletal system and cognitive function (stage II). BMJ Open. 2017;7(8):e013606. pmid:28780537
  20. 20. Afshin O, Iraj N, Bagher L, Ramin H, Hossein D, Katayoun V, et al. Bushehr Elderly Health (BEH) Programme, phase I (cardiovascular system). BMJ Open. 2015;5(12):e009597. pmid:26674503
  21. 21. Fischbacher CM, Bhopal R, Unwin N, White M, Alberti KG. The performance of the Rose angina questionnaire in South Asian and European origin populations: a comparative study in Newcastle, UK. Int J Epidemiol. 2001;30(5):1009–16. pmid:11689512
  22. 22. Wélén Schef K, Tornvall P, Alfredsson J, Hagström E, Ravn-Fischer A, Soderberg S, et al. Prevalence of angina pectoris and association with coronary atherosclerosis in a general population. Heart. 2023;109(19):1450.
  23. 23. Rose GA. The diagnosis of ischaemic heart pain and intermittent claudication in field surveys. Bull World Health Organ. 1962;27(6):645–58. pmid:13974778
  24. 24. Najafi-Ghezeljeh T, Kassaye Tessama M, Yadavar-Nikravesh M, Ekman I, Emami A. The Iranian version of Angina Pectoris characteristics questionnaire: reliability assessment. J Clin Nurs. 2009;18(5):694–9. pmid:19239536
  25. 25. Najafi-Ghezeljeh T. Corrigendum. Int J Nurs Pract. 2010;16(3):318.
  26. 26. Najafi-Ghezeljeh T, Ekman I, Nikravesh MY, Emami A. Adaptation and validation of the Iranian version of Angina Pectoris characteristics questionnaire. Int J Nurs Pract. 2008;14(6):470–6. pmid:19126076
  27. 27. Petrie CJ, Ponikowski P, Metra M, Mitrovic V, Ruda M, Fernandez A, et al. Proportional pulse pressure relates to cardiac index in stabilized acute heart failure patients. Clin Exp Hypertens. 2018;40(7):637–43. pmid:29265934
  28. 28. Rumbo-Rodríguez L, Sánchez-SanSegundo M, Ferrer-Cascales R, García-D’Urso N, Hurtado-Sánchez JA, Zaragoza-Martí A. Comparison of body scanner and manual anthropometric measurements of body shape: a systematic review. Int J Environ Res Public Health. 2021;18(12):6213. pmid:34201258
  29. 29. Bobos P, Nazari G, Lu Z, MacDermid JC. Measurement properties of the hand grip strength assessment: a systematic review with meta-analysis. Arch Phys Med Rehabil. 2020;101(3):553–65. pmid:31730754
  30. 30. Peeters M, Zondervan-Zwijnenburg M, Vink G, van de Schoot R. How to handle missing data: a comparison of different approaches. Eur J Dev Psychol. 2015;12(4):377–94.
  31. 31. Abu Alfeilat HA, Hassanat ABA, Lasassmeh O, Tarawneh AS, Alhasanat MB, Eyal Salman HS, et al. Effects of distance measure choice on K-nearest neighbor classifier performance: a review. Big Data. 2019;7(4):221–48. pmid:31411491
  32. 32. Harel O, Zhou X-H. Multiple imputation: review of theory, implementation and software. Stat Med. 2007;26(16):3057–77. pmid:17256804
  33. 33. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825–30.
  34. 34. Dixon SJ, Brereton RG. Comparison of performance of five common classifiers represented as boundary methods: Euclidean Distance to Centroids, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Learning Vector Quantization and Support Vector Machines, as dependent on data structure. Chemometr Intell Lab Syst. 2009;95(1):1–17.
  35. 35. LaValley MP. Logistic regression. Circulation. 2008;117(18):2395–9.
  36. 36. Chollet F, Others. Keras; 2015 [updated 2015]. Available from: https://keras.io
  37. 37. Taud H, Mas JF. Multilayer perceptron (MLP). In: Camacho Olmedo MT, Paegelow M, Mas JF, Escobar F, editors. Geomatic approaches for modeling land change scenarios. Cham: Springer International Publishing; 2018. p. 451–5.
  38. 38. Bramer M. Avoiding overfitting of decision trees. In: Bramer M, editor. Principles of data mining. London: Springer London; 2007. p. 119–34.
  39. 39. Qi Y. Random forest for bioinformatics. In: Zhang C, Ma Y, editors. Ensemble machine learning: methods and applications. New York (NY): Springer New York; 2012. p. 307–23.
  40. 40. Azmi SS, Baliga S. An overview of boosting decision tree algorithms utilizing AdaBoost and XGBoost boosting strategies. IRJET. 2020;7(5):6867–70.
  41. 41. Peterson LE. K-nearest neighbor. Scholarpedia. 2009;4(2):1883.
  42. 42. O’Malley T, Bursztein E, Long J, Chollet F, Jin H, Invernizzi L, et al. Keras tuner; 2019 [updated 2019]. Available from: https://github.com/keras-team/keras-tuner
  43. 43. Berrar D. Cross-validation. In: Ranganathan S, Cannataro M, Khan MA, editors. Encyclopedia of bioinformatics and computational biology. 2nd ed. Elsevier (In Press); 2025.
  44. 44. Youden WJ. Index for rating diagnostic tests. Cancer. 1950;3(1):32–5. pmid:15405679
  45. 45. Lee C, Kim H. Machine learning-based predictive modeling of depression in hypertensive populations. PLOS ONE. 2022;17(7):e0272330.
  46. 46. Dianati-Nasab M, Salimifard K, Mohammadi R, Saadatmand S, Fararouei M, Hosseini KS, et al. Machine learning algorithms to uncover risk factors of breast cancer: insights from a large case-control study. Front Oncol. 2024;13:1276232. pmid:38425674
  47. 47. Hastie T, Tibshirani R, Friedman J. Random forests. In: The elements of statistical learning: data mining, inference, and prediction. New York (NY): Springer New York; 2009. p. 587–604.
  48. 48. Van Belle V, Pelckmans K, Van Huffel S, Suykens JAK. Improved performance on high-dimensional survival data by application of Survival-SVM. Bioinformatics. 2010;27(1):87–94.
  49. 49. Dietterich T. Machine learning bias, statistical bias, and statistical variance of decision tree algorithms. Technical report; 1995.
  50. 50. Alwosheel A, van Cranenburgh S, Chorus CG. Is your dataset big enough? Sample size requirements when using artificial neural networks for discrete choice analysis. J Choice Model. 2018;28:167–82.
  51. 51. Ahmad A, Shelly-Cohen M, Corban MT, Murphree DH Jr, Toya T, Sara JD, et al. Machine learning aids clinical decision-making in patients presenting with angina and non-obstructive coronary artery disease. Eur Heart J Digit Health. 2021;2(4):597–605. pmid:36713103
  52. 52. Wang Z, Sun Z, Yu L, Wang Z, Li L, Lu X. Machine learning-based prediction of composite risk of cardiovascular events in patients with stable angina pectoris combined with coronary heart disease: development and validation of a clinical prediction model for Chinese patients. Front Pharmacol. 2024;14.
  53. 53. Lanza GA, Cianflone D, Rebuzzi AG, Angeloni G, Sestito A, Ciriello G, et al. Prognostic value of ventricular arrhythmias and heart rate variability in patients with unstable angina. Heart. 2006;92(8):1055–63. pmid:16387812
  54. 54. Coviello I, Pinnacchio G, Laurito M, Stazi A, Battipaglia I, Barone L, et al. Prognostic role of heart rate variability in patients with ST-segment elevation acute myocardial infarction treated by primary angioplasty. Cardiology. 2013;124(1):63–70. pmid:23328532
  55. 55. Chen L, He Y, Wang Y, Liu S, Li Q, Chen J, et al. Association of angina, myocardial infarction and atrial fibrillation-a bidirectional Mendelian randomization study. Br J Hosp Med (Lond). 2024;85(9):1–13.
  56. 56. Violi F, Soliman EZ, Pignatelli P, Pastori D. Atrial fibrillation and myocardial infarction: a systematic review and appraisal of pathophysiologic mechanisms. J Am Heart Assoc. 2016;5(5).
  57. 57. George JN, Nester CM. Syndromes of thrombotic microangiopathy. N Engl J Med. 2014;371(7):654–66.
  58. 58. Sposato LA, Lam M, Allen B, Richard L, Shariff SZ, Saposnik G. First-ever ischemic stroke and increased risk of incident heart disease in older adults. Neurology. 2020;94(15):e1559–70. pmid:32156691
  59. 59. Flossmann E. Genetics of ischaemic stroke; single gene disorders. Int J Stroke. 2006;1(3):131–9. pmid:18706033
  60. 60. Ueda Y, Kosugi S, Abe H, Ozaki T, Mishima T, Date M, et al. Transient increase in blood thrombogenicity may be a critical mechanism for the occurrence of acute myocardial infarction. J Cardiol. 2021;77(3):224–30.
  61. 61. Kosugi S, Ueda Y, Abe H, Ikeoka K, Mishima T, Ozaki T, et al. Temporary rise in blood thrombogenicity in patients with acute myocardial infarction. TH Open. 2021;6(1):e26–32. pmid:35088024
  62. 62. Pagliaro BR, Cannata F, Stefanini GG, Bolognese L. Myocardial ischemia and coronary disease in heart failure. Heart Fail Rev. 2020;25(1):53–65. pmid:31332663
  63. 63. Muzzarelli S, Pfisterer M. Anemia as independent predictor of major events in elderly patients with chronic angina. Am Heart J. 2006;152(5):991–6.
  64. 64. Lanser L, Fuchs D, Scharnagl H, Grammer T, Kleber ME, März W. Anemia of chronic disease in patients with cardiovascular disease. Front Cardiovasc Med. 2021;8.
  65. 65. Sarnak MJ, Tighiouart H, Manjunath G, MacLeod B, Griffith J, Salem D, et al. Anemia as a risk factor for cardiovascular disease in The Atherosclerosis Risk in Communities (ARIC) study. J Am Coll Cardiol. 2002;40(1):27–33. pmid:12103252
  66. 66. Vicente-Ibarra N, Marín F, Pernías-Escrig V, Sandín-Rollán M, Núñez-Martínez L, Lozano T. Impact of anemia as risk factor for major bleeding and mortality in patients with acute coronary syndrome. Eur J Intern Med. 2019;61:48–53.
  67. 67. Grammer TB, Kleber ME, Silbernagel G, Pilz S, Scharnagl H, Tomaschitz A, et al. Hemoglobin, iron metabolism and angiographic coronary artery disease (The Ludwigshafen Risk and Cardiovascular Health Study). Atherosclerosis. 2014;236(2):292–300. pmid:25112800
  68. 68. Deferrari G, Cipriani A, La Porta E. Renal dysfunction in cardiovascular diseases and its consequences. J Nephrol. 2021;34(1):137–53. pmid:32870495
  69. 69. Astor BC, Coresh J, Heiss G, Pettitt D, Sarnak MJ. Kidney function and anemia as risk factors for coronary heart disease and mortality: the Atherosclerosis Risk in Communities (ARIC) Study. Am Heart J. 2006;151(2):492–500. pmid:16442920
  70. 70. Volpe M, Gallo G. Hypertension, coronary artery disease and myocardial ischemic syndromes. Vascul Pharmacol. 2023;153:107230. pmid:37739329
  71. 71. Iglesias P, Benavent M, López G, Arias J, Romero I, Díez JJ. Hyperthyroidism and cardiovascular disease: an association study using big data analytics. Endocrine. 2024;83(2):405–13.
  72. 72. Mahzari MM, Alserehi AH, Almutairi SA, Alanazi KH, Alharbi MA, Mohamud M. Hypothyroidism and the risk of coronary artery disease in Saudi patients. J Family Community Med. 2022;29(1):34–40. pmid:35197726
  73. 73. Toh JZK, Pan X-H, Tay PWL, Ng CH, Yong JN, Xiao J, et al. A meta-analysis on the global prevalence, risk factors and screening of coronary heart disease in nonalcoholic fatty liver disease. Clin Gastroenterol Hepatol. 2022;20(11):2462-2473.e10. pmid:34560278
  74. 74. Lee H, Lee Y-H, Kim SU, Kim HC. Metabolic dysfunction-associated fatty liver disease and incident cardiovascular disease risk: a nationwide cohort study. Clin Gastroenterol Hepatol. 2021;19(10):2138-2147.e10. pmid:33348045
  75. 75. Balbirsingh V, Mohammed AS, Turner AM, Newnham M. Cardiovascular disease in chronic obstructive pulmonary disease: a narrative review. Thorax. 2022;77(9):939–45.
  76. 76. Tsai C-C, Chuang S-Y, Hsieh I-C, Ho L-H, Chu P-H, Jeng C. The association between psychological distress and angina pectoris: a population-based study. PLoS One. 2019;14(11):e0224451. pmid:31703084
  77. 77. Yin H, Liu Y, Ma H, Liu G, Guo L, Geng Q. Associations of mood symptoms with NYHA functional classes in angina pectoris patients: a cross-sectional study. BMC Psychiatry. 2019;19(1):85. pmid:30836983
  78. 78. Xue Y, Yang X, Liu G. Association of combined body mass index and central obesity with cardiovascular disease in middle-aged and older adults: a population-based prospective cohort study. BMC Cardiovasc Disord. 2024;24(1):443. pmid:39180009
  79. 79. Chi JH, Lee BJ. Association of myocardial infarction and angina pectoris with obesity and biochemical indices in the South Korean population. Sci Rep. 2022;12(1):13769. pmid:35962047
  80. 80. Abd Elaziz TA, Eldeeb MA, Mannaa MA, El-Damanhory AS. Relation between central obesity and severity of coronary artery disease in patients undergoing coronary angiography. ZUMJ. 2025;31(3):1031–44.
  81. 81. Chi JH, Lee BJ. Association of relative hand grip strength with myocardial infarction and angina pectoris in the Korean population: a large-scale cross-sectional study. BMC Public Health. 2024;24(1):941. pmid:38566101
  82. 82. Lutski M, Weinstein G, Tanne D, Goldbourt U. Angina pectoris severity and late-life frailty among men with cardiovascular disease. Aging Male. 2020;23(5):1022–9. pmid:31446880
  83. 83. Kim W, Park S-H, Kim W-S, Jang WY, Park EJ, Kang DO, et al. Handgrip strength as a predictor of exercise capacity in coronary heart disease. J Cardiopulm Rehabil Prev. 2020;40(2):E10–3. pmid:32118655
  84. 84. Alshammary AF, Alharbi KK, Alshehri NJ, Vennu V, Ali Khan I. Metabolic syndrome and coronary artery disease risk: a meta-analysis of observational studies. Int J Environ Res Public Health. 2021;18(4):1773. pmid:33670349
  85. 85. Ghavami T, Kazeminia M, Ahmadi N, Rajati F. Global prevalence of obstructive sleep apnea in the elderly and related factors: a systematic review and meta-analysis study. J Perianesth Nurs. 2023;38(6):865–75. pmid:37318436
  86. 86. Ooi EL, Rajendran S. Obstructive sleep apnea in coronary artery disease. Curr Probl Cardiol. 2023;48(8):101178. pmid:35341799
  87. 87. da Silva Machado G, Araújo HGS, Corrêa PB, Santos CC, Barbosa MFNP, Barbosa GNP. The role of obstructive sleep apnea risk (BOAH Score) in predicting angina: evidence from NHANES 2017–2020. Sleep Sci Pract. 2024;8(1):25.
  88. 88. Liu J, Zhu Y, Chang Y, Xu Z, Wang C, Zhang P, et al. Association of objective sleep characteristics and incident angina pectoris: a longitudinal analysis from the sleep heart health study. Nat Sci Sleep. 2023;15:955–65. pmid:38021212
  89. 89. Strenth C, Wani A, Alla R, Khan S, Schneider FD, Thakur B. Obstructive sleep apnea and its cardiac implications in the United States: an age‐stratified analysis between young and older adults. J Am Heart Assoc. 2024;13(12):e033810.
  90. 90. Sarokhani M, Goli M, Salarvand S, Ghanei Gheshlagh R. The prevalence of sleep apnea in iran: a systematic review and meta-analysis. Tanaffos. 2019;18(1):1–10.
  91. 91. Peracaula M, Torres D, Poyatos P, Luque N, Rojas E, Obrador A, et al. Endothelial dysfunction and cardiovascular risk in obstructive sleep apnea: a review article. Life (Basel). 2022;12(4):537. pmid:35455027
  92. 92. Ford ES, Giles WH, Croft JB. Prevalence of nonfatal coronary heart disease among American adults. Am Heart J. 2000;139(3):371–7.
  93. 93. Bodegard J, Erikssen G, Bjornholt JV, Thelle D, Erikssen J. Possible angina detected by the WHO angina questionnaire in apparently healthy men with a normal exercise ECG: coronary heart disease or not? A 26 year follow up study. Heart. 2004;90(6):627–32. pmid:15145862
  94. 94. Rahman MA, Spurrier N, Mahmood MA, Rahman M, Choudhury SR, Leeder S. Rose Angina Questionnaire: validation with cardiologists’ diagnoses to detect coronary heart disease in Bangladesh. Indian Heart J. 2013;65(1):30–9. pmid:23438610