Cardiovascular disease detection: A hybrid machine learning-AI framework for personalized diagnosis and risk assessment | PLOS One

Advertisement

Browse Subject Areas

?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Table 1 — Table 1.

A comparative summary of relevant work on cardiovascular disease prediction using machine learning and deep learning. The table compares methods, datasets, key performance metrics, and key findings. This review highlights the major issues in the area, such as the interpretability, the generalizability, the cost of computation, and the privacy of data, all of which motivate the proposed hybrid model.

More »

Fig 1 — Fig 1.

Schematic architecture of the proposed hybrid SVM-PSO-AI framework for cardiovascular risk prediction: It has four input vectors V1 (Medical Imaging), V2 (Clinical Data), V3 (Genomic Information) and V4 (Medical History).
Data undergoes preprocessing before being processed by the core model. The Particle Swarm Optimization (PSO) algorithm optimizes the hyperparameters (C, γ) of the Support Vector Machine (SVM) classifier. The trained SVM generates a cardiovascular risk score. The obtained prediction is then interpreted by the SHapley Additive exPlanations (SHAP) module to provide feature-level explanations.

More »

Fig 2 — Fig 2.

Flowchart of the end-to-end predictive process of the hybrid framework: The first stage is the reading of the data set or a new patient case.
The input vectors are preprocessed (handling missing values, normalization). Initial parameters for PSO and the weighting coefficients (β1-β4) are set. An initial SVM model is trained and then its hyperparameters are optimized using the modified PSO algorithm. The improved SVM model was trained on the entire dataset, and its predictions were interpreted using SHAP to obtain the score of each feature that led to that prediction.

More »

Fig 3 — Fig 3.

High-level pseudocode for the proposed hybrid model: It outlines the sequential integration of the model’s core components.
It starts with the ingestion and preprocessing of the four input data vectors (V₁ -V₄). It then sets up the model parameters such as the weighting coefficients (β₁, β₂, β₃, β₄) and the Particle Swarm Optimization (PSO) parameters. After that it summarizes the workflow of integrating SVM, PSO-based optimization, and AI interpretability for cardiovascular risk prediction.

More »

Fig 4 — Fig 4.

Pseudocode for the modified Particle Swarm Optimization (PSO) algorithm used for SVM hyperparameter tuning: The figure gives the iterative optimization of SVM hyperparameters with PSO.
This is accomplished through initialization, velocity updates, location updates, and convergence criteria.

More »

Fig 5 — Fig 5.

Pseudocode for the modified Support Vector Machine (SVM) classifier: It highlights the particular changes that were made to improve the SVM to forecast cardiovascular risks.
The major changes are: 1) using hyperparameters (C, γ) that are optimized with the help of the PSO; 2) a squared hinge loss value to use to punish misclassification more severely and define the margin; 3) a dynamic class weighting mechanism that is used during training in order to encourage bias reduction due to imbalanced datasets. The result is a trained SVM model with the ability to produce strong risk scores, in addition to all the performance measures to review.

More »

Fig 6 — Fig 6.

Illustration of the hyperplane and margin of the modified SVM model: The figure shows how SVM develops the optimal hyperplane to separate two classes to maximize the margin.
Support vectors, which are the critical data points closest to the hyperplane, define the decision boundary. The tradeoff between maximizing the separation of classes and minimizing error of classification is indicated by the width of the margin. The samples misclassified are shown in a different area under which the quadratic hinge loss is reducing the error.

More »

Fig 7 — Fig 7.

Pseudocode for the AI interpretability module using SHapley Additive exPlanations (SHAP): The module is fed with the optimized SVM, and a set of input features to a particular prediction.
On every prediction, it calculates SHAP values, which are a way of measuring how much any one particular feature (e.g., cholesterol level, age, genetic marker) contributes to the final model output relative to a baseline average prediction. The ranking of the features is then made in order of their SHAP values in order to determine the most significant drivers of the prediction.

More »

Table 2 — Table 2.

Description, feature counts, and data partitioning of the MIMIC-III clinical database (v1.4) used in this study. The table categorizes patient data into three main relevant modalities: clinical data, medical history, and medical imaging. The approximate number of instances of each type of feature is recorded on the Count column: it is an indicator of the scale and variety of the dataset. Datas were separated into training (75 percent), validation (10 percent) and test (15 percent) sets to maintain patient level separation to prevent data leakage.

More »

Fig 8 — Fig 8.

Sample multi-parameter patient data from the MIMIC-III clinical database: This figure demonstrates the complicated, high-resolution time-series data monitored on a single patient on a visit to an Intensive Care Unit (ICU).
It provides an example of the real-world clinical data (vitals, laboratory results, and clinical observations) used for evaluation of the hybrid model.

More »

Table 3 — Table 3.

Parameter Settings for the Proposed Hybrid Model. The Particle Swarm Optimization (PSO) parameters (swarm size, iterations, inertia weight, cognitive and social coefficient) were empirically determined. The weighting coefficients (β₁, β₂, β₄) for the input vectors were tuned to reflect the relative importance of medical imaging, clinical data, and medical history, respectively. genomic data (β₃) was weighted as 0 in this study due to its unavailability in dataset.The regularization parameter (C) and the kernel parameter (γ) of the SVM are not predetermined but their values are dynamically determined by the PSO optimization process.

More »

Table 4 — Table 4.

Comparison of the proposed hybrid model with state of the art machine learning models on MIMIC-III test set. Performance metrics are reported as point estimates with 95% confidence intervals in parentheses. The proposed SVM-PSO- AI framework is compared to Logistic regression, K-Nearest Neighbors (KNN), a Decision Tree, which used Naive Bayes ensemble, and a Boosted Ensemble model. The comparison will be made using the standard classification metrics: Accuracy, Precision, Recall, F1-Score and Area Under the Receiver Operating Characteristic Curve (AUC-ROC). The findings illustrate that the proposed hybrid model has the best predictive capability in all the measures considered.

More »

Table 5 — Table 5.

Clinical diagnostic efficacy comparison of the proposed hybrid model and benchmark machine learning models. Metrics are reported as point estimates with 95% CIs. This table offers a discussion of the measures of high interest in clinical implementation. Sensitivity (true positive rate) reflects the ability to correctly identify patients with CVD, while Specificity (true negative rate) indicates the ability to correctly rule out patients without CVD. The Negative Likelihood Ratio quantifies how much the odds of having the disease decrease with a negative test result, where smaller values indicate stronger diagnostic power. The suggested model has the best balance of the greatest sensitivity and specificity, as well as the least Negative Likelihood Ratio, which highlights its strength and better performance as a diagnostic tool in the clinical practice.

More »

Table 6 — Table 6.

Performance Comparison of the Proposed Hybrid Model with Deep Learning Models on the MIMIC-III test set. Metrics are reported as point estimates with 95% CIs. The proposed SVM-PSO-AI framework is evaluated against a Convolutional Neural Network (CNN) model, a combined Deep Learning/Machine Learning model, and a Deep Learning Neural Network. Reported metrics confirms the predictive power of the hybrid approach.

More »

Table 7 — Table 7.

Comparison of Sensitivity, Specificity, and Negative Likelihood Ratio with Deep Learning Models. Metrics are reported as point estimates with 95% CIs. This analysis extends the performance comparison to metrics paramount for clinical application. The proposed model achieves a competitive balance of these metrics against deep learning benchmarks. It attains the highest specificity (98.7%), minimizing false alarms, and a low Negative Likelihood Ratio (0.036), which is comparable to the best deep learning results. This demonstrates that the hybrid framework provides a highly reliable and efficient tool for clinical decision-making.

More »

Fig 9 — Fig 9.

Confusion matrix for the proposed hybrid model on the held-out test set from the MIMIC-III database: The matrix indicates the performance of the model in terms of classification, 2,892 True Positives (TP), 6,928 True Negatives (TN), 108 False Negatives (FN), and 72 False Positives (FP).
These results reveal that the model’s sensitivity (the ability to correctly identify patients with cardiovascular disease) is high (96.4%), as is the specificity (the ability to correctly identify patients without cardiovascular disease is high (98.7%), proving the model’s accuracy and low misclassification rate.

More »

Fig 10 — Fig 10.

Receiver Operating Characteristic (ROC) curve for the proposed hybrid model: The curve plots the True Positive Rate (Sensitivity) against the False Positive Rate (1 – Specificity) at various classification thresholds.
The value of Area Under the Curve (AUC) is high (0.977) which shows great performance of the model and capability of discriminating between patients with and without cardiovascular disease.

More »

Table 8 — Table 8.

Ablation study quantifying the individual contribution of each core component to the performance of the hybrid framework. The study systematically evaluates three configurations: a baseline Standard SVM without hyperparameter optimization, the SVM optimized with the proposed modified Particle Swarm Optimization (PSO) algorithm and the Full Hybrid Model. The findings indicate that the PSO optimization can be used to gain a significant performance improvement in all metrics (e.g., + 4.4% accuracy, + 3.5% AUC) which proves the importance of the pivot role of hyperparameter tuning. Incorporating the SHAP module that introduces explainability does not lead to degradation of performance, as it proves that the framework provides high accuracy without compromising interpretability, which is one of the conditions that must be fulfilled when adopting the framework by a healthcare institution.

More »

Table 10 — Table 10.

P-value Analysis for statistical validation of the hybrid model against deep learning models. This analysis extends the validation in Table 9 to advanced deep learning architectures. It implies that even though the proposed hybrid SVM-PSO-AI model has a different structure, it is more powerful in predicting cardiovascular risk, and these findings cannot be explained by random change in the data.

More »

Table 9 — Table 9.

Statistical significance testing (P-value Analysis) of the performance improvements achieved by the proposed hybrid model over state-of-the-art machine learning models. A statistical comparison was done on all major performance indicators of each comparative model. The p-values which are obtained as a result of relevant statistical tests support the idea that the observed better performance of the hybrid model is statistically significant (p < 0.05 of all comparisons and most p-values < 0.005). This strict comparison confirms the fact that the improvement of the performance is not caused by mere coincidence but a direct consequence of the effectiveness of the proposed framework.

More »

Table 11 — Table 11.

Time Complexity and Runtime Comparison of the Proposed Hybrid Model and other Models. This table contrasts the computational characteristics of the evaluated models. The proposed hybrid model’s complexity, O(p × i × n² × d), is dominated by PSO-based SVM optimization, where p is the particles, i is the iterations, n is the samples, d is the features. While the experimental runtime of the proposed hybrid model, 55 seconds, is higher than that of simpler models (such as logistic regression and KNN), it is significantly lower than that of complex deep learning models. The results demonstrate that the hybrid model is competitive in terms of computational efficiency, in addition to being more accurate.

More »