Development and validation of a cardiovascular risk prediction model for Sri Lankans using machine learning

Chamila Mettananda; Isuru Sanjeewa; Tinul Benthota Arachchi; Avishka Wijesooriya; Chiranjaya Chandrasena; Tolani Weerasinghe; Maheeka Solangaarachchige; Achila Ranasinghe; Isuru Elpitiya; Rashmi Sammandapperuma; Sujeewani Kurukulasooriya; Udaya Ranawaka; Arunasalam Pathmeswaran; Anuradhini Kasturiratne; Nei Kato; Rajitha Wickramasinghe; Prasanna Haddela; Janaka de Silva

doi:10.1371/journal.pone.0309843

Abstract

Introduction and objectives

Sri Lankans do not have a specific cardiovascular (CV) risk prediction model and therefore, World Health Organization(WHO) risk charts developed for the Southeast Asia Region are being used. We aimed to develop a CV risk prediction model specific for Sri Lankans using machine learning (ML) of data of a population-based, randomly selected cohort of Sri Lankans followed up for 10 years and to validate it in an external cohort.

Material and methods

The cohort consisted of 2596 individuals between 40–65 years of age in 2007, who were followed up for 10 years. Of them, 179 developed hard CV diseases (CVD) by 2017. We developed three CV risk prediction models named model 1, 2 and 3 using ML. We compared predictive performances between models and the WHO risk charts using receiver operating characteristic curves (ROC). The most predictive and practical model for use in primary care, model 3 was named “SLCVD score” which used age, sex, smoking status, systolic blood pressure, history of diabetes, and total cholesterol level in the calculation. We developed an online platform to calculate the SLCVD score. Predictions of SLCVD score were validated in an external hospital-based cohort.

Results

Model 1, 2, SLCVD score and the WHO risk charts predicted 173, 162, 169 and 10 of 179 observed events and the area under the ROC (AUC) were 0.98, 0.98, 0.98 and 0.52 respectively. During external validation, the SLCVD score and WHO risk charts predicted 56 and 18 respectively of 119 total events and AUCs were 0.64 and 0.54 respectively.

Conclusions

SLCVD score is the first and only CV risk prediction model specific for Sri Lankans. It predicts the 10-year risk of developing a hard CVD in Sri Lankans. SLCVD score was more effective in predicting Sri Lankans at high CV risk than WHO risk charts.

Citation: Mettananda C, Sanjeewa I, Benthota Arachchi T, Wijesooriya A, Chandrasena C, Weerasinghe T, et al. (2024) Development and validation of a cardiovascular risk prediction model for Sri Lankans using machine learning. PLoS ONE 19(10): e0309843. https://doi.org/10.1371/journal.pone.0309843

Editor: Gyaneshwer Chaubey, Banaras Hindu University, INDIA

Received: January 23, 2024; Accepted: August 20, 2024; Published: October 22, 2024

Copyright: © 2024 Mettananda et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The datasets used and analysed during the current study are available from the corresponding author or the Ethics Review Committee of the Faculty of Medicine, University of Kelaniya, Sri Lanka, Telephone no: 0112961267,email: ercmed@kln.ac.lk

Funding: This study was supported by the Strengthening Research Outputs Grant of the University of Kelaniya, Sri Lanka (RC/SROG/2021/01). The funding bodies played no role in the design of the study, collection, analysis, and interpretation of data or in writing the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

There are no cardiovascular (CV) risk prediction models specific to or derived from Sri Lankans. Therefore, different risk prediction models derived from white Caucasians, or models developed for the Southeast Asia region (SEAR), are being used for CV risk stratification of Sri Lankans, which is not ideal. It has been shown that the risk stratification of the same Sri Lankan cohort using World Health Organization/ International Society of Hypertension (WHO/ISH) risk charts vs Framingham score [1] and WHO/ISH charts, National Cholesterol Education Program ‐ Adult Treatment Panel III (NCEP-ATP III) scores vs Systematic Coronary Risk Evaluation (SCORE) charts are different [2]. WHO risk charts for the SEAR-B region developed in 2007 have been validated among Sri Lankans and are the best currently available risk stratification method for Sri Lankans. The predictions showed 81% agreement between predictions and observed events but were less predictive in females and those at high CV risk [3]. Sri Lanka is classified under the SEAR epidemiological sub-region with Indonesia, Cambodia, Laos, Sri Lanka, Maldives, Myanmar, Malaysia, Philippines, Thailand, Timor-Leste, Viet Nam, Mauritius, and Seychelles [4, 5] but the lifestyle, socio-economic and cultural backgrounds and risk behaviours of Sri Lankans are different to that of people living in other SEAR countries and therefore may not predict the CVD risk of Sri Lankans exactly. It has been shown that CVD risk stratification using machine learning of long-term follow-up cohorts is much more predictive than the risk models developed for a group of countries [6–10].

Therefore, we aimed to develop a CV risk prediction model using long-term follow-up data from a community-based cohort developed to study metabolic risk factors and non-communicable diseases of Sri Lankans and to validate the new model in an external cohort of Sri Lankans. In addition, we planned to develop an online platform with the best model identified for practical use of the model in any Sri Lankan.

Materials and methods

Study setting

We used 10-year follow-up data of the Ragama Health Study (RHS), a community-based ongoing study started in 2007 to study the epidemiology of metabolic and non-communicable diseases (NCD) in Sri Lanka. Details of the Ragama Health Study had been described previously [11].

The baseline cohort in the RHS comprised 35–64-year-old adults resident in the Ragama Medical Officer of Health (MOH) area (a health administrative area of Sri Lanka). The cohort was selected randomly and stratified by three age groups (35–44, 45–54 and 55–64 years) using the voters’ list. The study was started on the 1^st of February 2007. At baseline in 2007, all selected individuals were visited at their homes and data on past medical history and behavioural, metabolic and all potential risk factors for NCDs were collected from all consenting adults. The cohort was re-contacted in 2014 and 2017, and any CVD and new risk factors developed over the follow-up period were documented. All cardiovascular deaths, non-fatal strokes, and non-fatal myocardial infarctions, including elective percutaneous coronary interventions and coronary artery bypass grafts done on patients with symptomatic unstable angina that occurred from 2007 to 2017 were recorded as CVD during these follow-up visits by interviewing patients and their families and perusing clinical notes/death certificates. Of the 2923 participants enrolled at the beginning of the RHS study, 2685 were followed up after 10 years while the remaining 238 could not be traced. The group lost to follow-up was not significantly different from the group followed in terms of gender, age, current smoking status or FBS, SBP, or total cholesterol level at baseline [3]. The study was approved by the Ethics Review Committee of the Faculty of Medicine, University of Kelaniya, Sri Lanka (P38/09/2006). All participants provided written informed consent before enrolment.

Study population

Of the RHS cohort, we selected participants of 40 years or above who were naïve for CVD at baseline (2007) and were followed up for 10 years up to 2017. Participants who could not be traced in 2017 or whose current status (dead or alive) could not be determined were excluded. Diseased participants whose cause of death could not be verified were excluded from the analysis.

Data management, model development and statistical analysis

Baseline and follow-up data were extracted from the RHS database. Data extraction and model development started on 08^th December 2021. Authors had access to information that could identify individual participants during and after data collection. ML-based models were trained using the database. The missing data in the dataset were strategically addressed through a systematic approach. Initially, null values were filled with the mean, ensuring a representative and unbiased imputation. Subsequently, rows containing null values in crucial features, as identified by Random Forest Classifier (RFC) feature importance, were removed to preserve the integrity of the essential information. The determination of feature importance through RFC contributed to a targeted handling of missing values, prioritising influential features. Duplicates were identified and removed before training ML models to prevent biases, overfitting, and distorted performance metrics and ensured that the models learn from diverse and representative data, to enhance the performance of the model. All potential risk variables were evaluated for inclusion in the model using feature importance analyses conducted with Random Forest and gradient descent algorithms. Synthetic Minority Over-sampling Technique (SMOTE) [12] was applied to overcome class imbalance within the dataset as one class (the individuals without CVD) significantly outnumbered the other class (the individuals with CVD). Using SMOTE we oversampled the minority class, creating synthetic instances, and rectifying the class distribution imbalance in the data set. The oversampling process was parameterized with a specific sampling strategy set at 0.65 or dynamically adjusted if unspecified. 10-fold stratified cross-validation was implemented using the Stratified K-Fold technique [13] to rigorously evaluate the model performance. This method ensured that each cross-validated fold maintains a proportional representation of class instances similar to the original dataset. Within each fold, the dataset was partitioned into a training and a testing dataset. The model was trained on the training data set and evaluated on the testing data set, systematically gauging its robustness and generalization across diverse subsets of data. The combined use of SMOTE for oversampling and Stratified K-Fold for cross-validation provided a comprehensive exploration of the ML-based models’ performance characteristics, particularly in the context of imbalanced data. We developed prediction models using Random Forest Classifier [14]. RFC models were fine-tuned using the Grid Search Cross-Validation approach which involved systematically testing a range of hyperparameter values to identify the combination that produces the best model performance. The performances of the models were assessed through key metrics, including mean F1 score, recall, precision, and accuracy [15]. The mean F1 score was used in selecting the models especially because the data set was unbalanced and it combines precision and recall. A mean F1 score above 0.8 indicates a model with good proficiency in identifying individuals at risk. A precision above 0.75 indicates reliability in the accuracy of positive identifications. A recall score above 0.85 indicates high sensitivity in minimising false negatives. An accuracy exceeding 0.85 reflects a model’s overall correctness. Therefore, a model achieving a mean F1 score above 0.8, precision above 0.75, recall above 0.85, and accuracy above 0.85 was considered an effective model in risk prediction. In addition, we used Receiver Operating Characteristic curves also to measure the validity of predictions. We developed three models using different combinations of data of the cohort. In addition, we calculated CV risk predictions of all individuals with the 2019 WHO CV risk models. We calculated WHO risk (lab-based) with R-package [4, 16, 17]. We compared the predictions of the three models and the WHO risk charts.

Secondly, we externally validated the selected best model out of the three in a separate hospital-based database of consecutive patients, 40–74 years of age admitted to Colombo North Teaching Hospital (a tertiary care hospital in Sri Lanka) from 1^st of January 2019 to 1^st of August 2020 who did not have a history of CVD and presented with an acute incident CVD (acute myocardial infarction or acute stroke) or a disease other than an acute CVD who had complete data for CVD risk calculation. We used this method to increase the yield of CVDs in the validation sample as our aim was to study the accuracy of the new model in predicting incident CVDs. All patients’ predicted CVD risks were calculated with the new models and with 2019 WHO risk charts using the most recent pre-morbid risk factor data available up to one year before developing the incident CVD/ admission to the ward. We compared the predictions of the models with observed events using confusion matrices.

Thirdly, we developed an online platform for easy and practical use of the new model among all Sri Lankans.

All statistical analyses were done using SPSS version 22. Categorical data are reported as percentages. Continuous variables are reported as means with standard deviation (SD) or 95% confidence intervals. The significance level was set at p <0.05.

Ethics approval

Ethics approval was obtained from the Ethics Review Committee of the Faculty of Medicine, University of Kelaniya, Sri Lanka for the original RHS study (P38/09/2006) and the ML development (P61/09/2020). Written informed consent of participants was obtained for the recruitment and follow-up interviews.

Results

Data of 2596 participants with complete data of 10-year follow-up were selected to train the ML-based models. The baseline characteristics of the study cohort are shown in Table 1. Of them, 179 developed hard CVD events over the 10-year follow-up.

Download:

Table 1. Baseline characteristics of the model development cohort.

https://doi.org/10.1371/journal.pone.0309843.t001

Over the 10-year follow-up period, 179 hard CVD events were recorded; 57 CV deaths, 97 IHD (83 myocardial infarctions, 8 coronary artery bypass grafts, 6 primary percutaneous coronary interventions) and 25 stokes. Of the CVD events, 66 (36.9%) occurred in females and 113 (63.1%) in males.

Three models were developed using different combinations of variables. All the variables according to feature importance were used in Model 1, i.e., age, sex, systolic blood pressure (SBP), diastolic blood pressure (DBP), glycosylated haemoglobin level (HbA1), the ratio of total cholesterol to high-density lipoprotein level (TC/HDL), body mass index (BMI) and the duration of smoking etc. Considering the need of the model for practical use in the community, two more models were developed using freely available data in primary care. Model 2 uses age, sex, smoking status, systolic blood pressure, history of diabetes, and LDL level and model 3 uses age, sex, smoking status, systolic blood pressure, history of diabetes, and total cholesterol level Which are the same variables used in the WHO CV risk prediction charts.

The performance characteristics of the three new models and the WHO risk charts in terms of predicting individuals at high risk of CVD are shown in Fig 1 and Table 2.

Download:

Fig 1. Comparison of 10-year cardiovascular risk predictions against observed events using different prediction models in the model development cohort.

Panel A–Receiver operating characteristic curves. Panel B–Confusion matrixes.

https://doi.org/10.1371/journal.pone.0309843.g001

Download:

Table 2. Comparison of the predictive performances of the models.

https://doi.org/10.1371/journal.pone.0309843.t002

Model 1, 2, 3 and the WHO risk charts predicted 173, 162, 169 and 10 out of 179 observed CVDs and the area under the ROC (AUC) were 0.98, 098, 0.98 and 0.52 respectively. All three new ML-based models developed had mean F1 scores above 0.8, precision above 0.75, recall above 0.85, as well as accuracy above 0.85 indicating effective prediction. The WHO risk charts had a good accuracy of 0.92, but recall, was very low indicating a high chance of missing high-risk individuals.

We named the model-3 “Sri Lanka CVD score (SLCVD score)” which is the most practical to use for screening in primary care even in rural communities as it only needs total cholesterol level as a laboratory investigation in calculating the risk. We validated the prediction of SLCVD score, in an external hospital-based cohort of Sri Lankans. The baseline characteristics of the external validation cohort are given in Table 3.

Download:

Table 3. Baseline characteristics of the external validation cohort.

https://doi.org/10.1371/journal.pone.0309843.t003

The external validation cohort consisted of 119 patients with incident CVDs and 239 people naïve for CVD. We compared the predictions and observed events of the SLCVD score and the WHO 2019 score in the external validation cohort (Fig 2). SLCVD score and WHO risk chart predicted 56 and 18 CV events out of 119 observed events respectively. The SLCVD score was able to predict 38 more cases compared to the predictions of the WHO risk charts. SLCVD score had a 36.1% positive predictive value and 69.0% negative predictive value.

Download:

Fig 2. Comparison of cardiovascular risk predictions against observed events using different prediction models in the external validation cohort.

Panel A–Receiver operating characteristic curves. Panel B–Confusion matrixes.

https://doi.org/10.1371/journal.pone.0309843.g002

Finally, we developed an online platform to calculate the SLCVD score for easy and practical use of the model among any Sri Lankan anywhere in the country and is available through the link below. SLCVD score calculator.

Discussion

We developed the first-ever CVD risk prediction model, “SLCVD score”, specific for Sri Lankans using machine learning of a Sri Lankan cohort prospectively followed up for 10 years. This is the first CV risk prediction model developed using individual data of Sri Lankans and the only risk prediction model specific to Sri Lankans. The SLCVD score was better in predicting Sri Lankans at high CVD risk than the WHO risk charts developed for the Southeast Asia region. The SLCVD score is simple and uses only a few easily available data in risk stratification enabling its usage even in rural Sri Lanka. We developed an online platform for SLCVD score calculation to make it user-friendly and available for risk stratification of any Sri Lankan anywhere in the world.

The superior ability of ML-based risk assessments than the WHO risk charts in identifying high-risk individuals was observed previously as well in an initial trial of ML model development using all available data variables (75 variables) of the same Sri Lankan cohort [18, 19]. The ability to detect high-risk patients efficiently for primary prevention is very important for a low-middle income country like Sri Lanka, as those are the individuals to benefit mostly by treatment and to be treated aggressively in a risk-based approach than in a blanket population-based approach in primary prevention [20]. Even though the WHO risk charts had good accuracy, it was less sensitive in detecting high-risk individuals. Accuracy is a good measure when the data are quite balanced and when interested in all types of outputs equally [21]. However, as our data set is quite imbalanced and we are more interested in detecting high-risk individuals, accuracy is not the best measure in identifying the best model. It is important not to miss any high-risk individuals as missing a positive case has a much bigger cost than wrongly classifying somebody as having a high risk of CVD. Therefore, maximizing precision and recall is important, and the SLCVD score fared well in the WHO risk charts. Neither precision nor recall is necessarily useful alone since we are interested in the overall picture. F1-score combines precision and recall and works also for cases where the datasets are imbalanced and the SLCVD score had a better F1 score than the WHO risk charts.

ML has been shown to be more effective in predictive events as it carefully studies the behaviours of cohorts over long periods [6–9, 22]. However, using optimal algorithms in developing the ML models is essential [23, 24]. Since the SLCVD score was developed by studying individual data of a cohort of Sri Lankans followed up for 10 years, its likelihood of being more predictive for Sri Lankans than a model developed for an epidemiological region covering several nations without studying individual follow-up data is understandable. Moreover, the satisfactory validation of the SLCVD score in the external validation cohort and the superiority in predicting high-risk individuals of the external validation cohort compared to WHO risk charts attest to the reliability of the predictions of the SLCVD score.

ML-based models using more variables, like model 1, was able to predict more events than the SLCVD score, but we selected the SLCVD score for the online risk calculator development as a simple score using only a few freely available data is the fundamental need of a screening test, especially in a resource-limited setting like Sri Lanka. With the widespread use of smartphones, the online risk calculator can be used by any healthcare or even non-healthcare personnel to calculate CVD risk without having to refer to charts. This will reduce the time spent on risk stratification of individuals and will increase uptake of risk stratification practice in busy primary health and eventually help prevention of non-communicable diseases.

There are several strengths in our study leading to the validity of the SLCVD score. We used data of a Sri Lankan, community-based, randomly selected, semi-urban cohort, individually and prospectively followed up for 10 years to develop the model. Only the data of participants who completed 10-year follow-up were used in the development of the ML model. Patients were followed up by medical officers using face-to-face interviews and medical records and/or death certificates; therefore, self-reporting bias was minimal, and data quality was guaranteed. Hard CVD endpoints were used, and therefore, the data were clear and accurate. We validated the model internally and externally in a separate cohort, and both showed similar predictive performances. We also compared the predictions of the SLCVD score with that of the reference WHO model, and it also showed superiority in predicting high-risk individuals. However, there is one limitation of our study. Even though the cohort we used to train the ML model was a community-based, multi-ethnic random cohort, representation of the estate sector was less in our cohort compared to national distribution. The national distribution of the population in Sri Lanka of urban: rural; estate sectors at the 2012 census was 18.2: 77.4: 4.4 [25] and the same in the Gampaha district where the cohort was drawn from was 15.6: 84.3: 0.1 [26]. However, the percentage distribution of the estate sector is very small compared to the national distribution, and therefore, the effect of this limitation is expected to be minimal. Further, we compared risk factors between cases and controls to develop the model. However, some behavioural risk associations like compliance with medications etc. were not studied. However, we believe that the impact of those factors was minimal as the sample was randomly selected.

In conclusion, we developed the SLCVD score using ML of a Sri Lankan cohort followed up for 10 years for CV risk prediction of Sri Lankans. SLCVD score is more efficient in predicting Sri Lankans at high CVD risk compared to the currently used WHO risk charts developed for the Southeast Asia region. SLCVD score can be calculated using an online calculator which is free to be used by anybody. The score calculator uses only six freely available variables namely, age, sex, smoking status, diabetes status, systolic blood pressure and total cholesterol level and predicts a 10-year risk of developing a hard CVD (i.e.: cardiovascular deaths, non-fatal strokes, and non-fatal myocardial infarctions including elective percutaneous coronary interventions and coronary artery bypass grafts). This calculator is a screening tool valid for any Sri Lankan above 40 years of age who has not had any CVD at the time of calculation. A risk score equal to or more than 20% indicates an individual is at high risk of developing a hard CVD in the next 10 years and they should be aggressively treated with primary preventive measures. The risk scores can be used in executing guideline-based management of hyperlipidaemia and hypertension among Sri Lantanas. Serial risk calculations could be used to objectively study the success of primary prevention interventions.

Acknowledgments

We thank all who continuously supported the Ragama Health Study, and especially the study participants for their continued cooperation.

Patient and public involvement statement

It was not appropriate or possible to involve patients or the public in the design or reporting plans of our research, but it was involved in the conduct and dissemination of the original RHS study. The results of the current study will be disseminated to study participants, other patients and the public following the publication of the study.

References

1. Mettananda KCD, Gunasekara N, Thampoe R, Madurangi S, Pathmeswaran A. Place of cardiovascular risk prediction models in South Asians; agreement between Framingham risk score and WHO/ISH risk charts. Int J Clin Pract. 2021;75(7):e14190. Epub 2021/03/30. pmid:33780102.
- View Article
- PubMed/NCBI
- Google Scholar
2. Ranawaka U, Wijekoon N, Pathmeswaran P, Kasturiratne A, Gunasekara D, Chackrewarthy S, et al. Risk estimates of cardiovascular diseases in a Sri Lankan community. Ceylon Med J. 2016;61:11. pmid:27031973
- View Article
- PubMed/NCBI
- Google Scholar
3. Thulani UB, Mettananda KCD, Warnakulasuriya DTD, Peiris TSG, Kasturiratne K, Ranawaka UK, et al. Validation of the World Health Organization/ International Society of Hypertension (WHO/ISH) cardiovascular risk predictions in Sri Lankans based on findings from a prospective cohort study. PLoS One. 2021;16(6):e0252267. Epub 2021/06/08. pmid:34097699; PubMed Central PMCID: PMC8183983.
- View Article
- PubMed/NCBI
- Google Scholar
4. WHO. World Health Organization cardiovascular disease risk charts: revised models to estimate risk in 21 global regions. Lancet Glob Health. 2019;7(10):e1332-e45. Epub 2019/09/07. doi: https://doi.org/10.1016/s2214-109x(19)30318-3. PubMed PMID: 31488387; PubMed Central PMCID: PMC7025029.
5. WHO. World Health Organization/International Society of Hypertension risk prediction charts for 14 WHO epidemiological sub-regions: WHO; 2007.
6. Weng SF, Reps J, Kai J, Garibaldi JM, Qureshi N. Can machine learning improve cardiovascular risk prediction using routine clinical data? PLOS ONE. 2017;12(4):e0174944. pmid:28376093
- View Article
- PubMed/NCBI
- Google Scholar
7. Alaa AM, Bolton T, Di Angelantonio E, Rudd JHF, van der Schaar M. Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants. PLoS One. 2019;14(5):e0213653. Epub 20190515. pmid:31091238; PubMed Central PMCID: PMC6519796.
- View Article
- PubMed/NCBI
- Google Scholar
8. Pal M, Parija S, Panda G, Dhama K, Mohapatra RK. Risk prediction of cardiovascular disease using machine learning classifiers. Open Med (Wars). 2022;17(1):1100–13. Epub 20220617. pmid:35799599; PubMed Central PMCID: PMC9206502.
- View Article
- PubMed/NCBI
- Google Scholar
9. Xi Y, Wang H, Sun N. Machine learning outperforms traditional logistic regression and offers new possibilities for cardiovascular risk prediction: A study involving 143,043 Chinese patients with hypertension. Front Cardiovasc Med. 2022;9:1025705. Epub 20221114. pmid:36451926; PubMed Central PMCID: PMC9701715.
- View Article
- PubMed/NCBI
- Google Scholar
10. Jia Y, Yu G, Ju-Jiao K, Hui-Fu W, Ming Y, Jian-Feng F, et al. Development of machine learning-based models to predict 10-year risk of cardiovascular disease: a prospective cohort study. Stroke and Vascular Neurology. 2023:svn-2023-002332. pmid:37105576
- View Article
- PubMed/NCBI
- Google Scholar
11. Dassanayake AS, Kasturiratne A, Rajindrajith S, Kalubowila U, Chakrawarthi S, De Silva AP, et al. Prevalence and risk factors for non-alcoholic fatty liver disease among adults in an urban Sri Lankan population. J Gastroenterol Hepatol. 2009;24(7):1284–8. Epub 2009/05/30. pmid:19476560.
- View Article
- PubMed/NCBI
- Google Scholar
12. WHO. 2022. Available from: https://www.who.int/srilanka/news/detail/11-02-2020-who-country-office-sri-lanka-launches-a-model-health-corner#:~:text=Non%2Dcommunicable%20diseases%20(NCDs),in%20four%20adults%20consume%20tobacco.
13. Chawla N, Bowyer K, Hall LO, Kegelmeyer WP. SMOTE: Synthetic Minority Over-sampling Technique. ArXiv. 2002;abs/1106.1813.
- View Article
- Google Scholar
14. Parmar A, Katariya R, Patel V, editors. A Review on Random Forest: An Ensemble Classifier. International Conference on Intelligent Data Communication Technologies and Internet of Things (ICICI) 2018; 2019 2019//; Cham: Springer International Publishing.
- View Article
- Google Scholar
15. Cichosz P. Assessing the quality of classification models: Performance measures and evaluation procedures. Open Engineering. 2011;1(2):132–58.
- View Article
- Google Scholar
16. Collins D, Lee J, Bobrovitz N, Koshiaris C, Ward A, Heneghan C. whoishRisk ? an R package to calculate WHO/ISH cardiovascular risk scores for all epidemiological subregions of the world [version 2; peer review: 3 approved]. F1000Research. 2017;5(2522). pmid:28357040
- View Article
- PubMed/NCBI
- Google Scholar
17. Collins D. Update of WHOISHRisk function based on WHO 2019 paper #1 2024 [updated 08.03.2020; cited 2024 01.08.2024]. Available from: https://github.com/DylanRJCollins/whoishRisk/pull/1/commits/b59c700d758512464e554cac6650850983efdc48.
18. Mettananda C SM, Haddela PS, Dassanayake AS, Kasturiratne A, Wickramasinghe AR, de Silva HJ. Efficacy of Cardiovascular Disease risk prediction using Machine Learning compared to World Health Organization risk charts, based on data derived from a prospective cohort of Sri Lankans. Journal of the Ceylon College of Physicians. Journal of the Ceylon College of Physicians 2023;1(55).
- View Article
- Google Scholar
19. Mettananda C SM HP, Dassanayake AS, Kasturiratne A, Wickramasinghe AR, de Silva HJ. Efficacy of Cardiovascular Disease risk prediction using Machine Learning compared to World Health Organization risk charts, based on data derived from a prospective cohort of Sri Lankans. BMJopen (being reviewed). 2023.
- View Article
- Google Scholar
20. Zulman DM, Vijan S, Omenn GS, Hayward RA. The relative merits of population-based and targeted prevention strategies. Milbank Q. 2008;86(4):557–80. pmid:19120980; PubMed Central PMCID: PMC2690369.
- View Article
- PubMed/NCBI
- Google Scholar
21. Kuhn M, Johnson K. Measuring Performance in Classification Models. In: Kuhn M, Johnson K, editors. Applied Predictive Modeling. New York, NY: Springer New York; 2013. p. 247–73.
22. Dalal S, Goel P, Onyema EM, Alharbi A, Mahmoud A, Algarni MA, et al. Application of Machine Learning for Cardiovascular Disease Risk Prediction. Computational Intelligence and Neuroscience. 2023;2023:9418666.
- View Article
- Google Scholar
23. Krittanawong C, Virk HUH, Bangalore S, Wang Z, Johnson KW, Pinotti R, et al. Machine learning prediction in cardiovascular diseases: a meta-analysis. Sci Rep. 2020;10(1):16057. Epub 20200929. pmid:32994452; PubMed Central PMCID: PMC7525515.
- View Article
- PubMed/NCBI
- Google Scholar
24. Chiarito M, Luceri L, Oliva A, Stefanini G, Condorelli G. Artificial Intelligence and Cardiovascular Risk Prediction: All That Glitters is not Gold. Eur Cardiol. 2022;17:e29. Epub 20221220. pmid:36845218; PubMed Central PMCID: PMC9947926.
- View Article
- PubMed/NCBI
- Google Scholar
25. Department of Census and Statistics SL. Census of Population and Housing Sri Lanka 2012. 2012.
26. Department of Census and Statistics SL. Census of Population and Housing of Sri Lanka, 2012, Gampaha District. Colombo: Department of Census and Population, 2012.

[ref1] 1. Mettananda KCD, Gunasekara N, Thampoe R, Madurangi S, Pathmeswaran A. Place of cardiovascular risk prediction models in South Asians; agreement between Framingham risk score and WHO/ISH risk charts. Int J Clin Pract. 2021;75(7):e14190. Epub 2021/03/30. pmid:33780102.
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Ranawaka U, Wijekoon N, Pathmeswaran P, Kasturiratne A, Gunasekara D, Chackrewarthy S, et al. Risk estimates of cardiovascular diseases in a Sri Lankan community. Ceylon Med J. 2016;61:11. pmid:27031973
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Thulani UB, Mettananda KCD, Warnakulasuriya DTD, Peiris TSG, Kasturiratne K, Ranawaka UK, et al. Validation of the World Health Organization/ International Society of Hypertension (WHO/ISH) cardiovascular risk predictions in Sri Lankans based on findings from a prospective cohort study. PLoS One. 2021;16(6):e0252267. Epub 2021/06/08. pmid:34097699; PubMed Central PMCID: PMC8183983.
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. WHO. World Health Organization cardiovascular disease risk charts: revised models to estimate risk in 21 global regions. Lancet Glob Health. 2019;7(10):e1332-e45. Epub 2019/09/07. doi: https://doi.org/10.1016/s2214-109x(19)30318-3. PubMed PMID: 31488387; PubMed Central PMCID: PMC7025029.

[ref5] 5. WHO. World Health Organization/International Society of Hypertension risk prediction charts for 14 WHO epidemiological sub-regions: WHO; 2007.

[ref6] 6. Weng SF, Reps J, Kai J, Garibaldi JM, Qureshi N. Can machine learning improve cardiovascular risk prediction using routine clinical data? PLOS ONE. 2017;12(4):e0174944. pmid:28376093
View Article
PubMed/NCBI
Google Scholar

[16] View Article

[17] PubMed/NCBI

[18] Google Scholar

[ref7] 7. Alaa AM, Bolton T, Di Angelantonio E, Rudd JHF, van der Schaar M. Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants. PLoS One. 2019;14(5):e0213653. Epub 20190515. pmid:31091238; PubMed Central PMCID: PMC6519796.
View Article
PubMed/NCBI
Google Scholar

[20] View Article

[21] PubMed/NCBI

[22] Google Scholar

[ref8] 8. Pal M, Parija S, Panda G, Dhama K, Mohapatra RK. Risk prediction of cardiovascular disease using machine learning classifiers. Open Med (Wars). 2022;17(1):1100–13. Epub 20220617. pmid:35799599; PubMed Central PMCID: PMC9206502.
View Article
PubMed/NCBI
Google Scholar

[24] View Article

[25] PubMed/NCBI

[26] Google Scholar

[ref9] 9. Xi Y, Wang H, Sun N. Machine learning outperforms traditional logistic regression and offers new possibilities for cardiovascular risk prediction: A study involving 143,043 Chinese patients with hypertension. Front Cardiovasc Med. 2022;9:1025705. Epub 20221114. pmid:36451926; PubMed Central PMCID: PMC9701715.
View Article
PubMed/NCBI
Google Scholar

[28] View Article

[29] PubMed/NCBI

[30] Google Scholar

[ref10] 10. Jia Y, Yu G, Ju-Jiao K, Hui-Fu W, Ming Y, Jian-Feng F, et al. Development of machine learning-based models to predict 10-year risk of cardiovascular disease: a prospective cohort study. Stroke and Vascular Neurology. 2023:svn-2023-002332. pmid:37105576
View Article
PubMed/NCBI
Google Scholar

[32] View Article

[33] PubMed/NCBI

[34] Google Scholar

[ref11] 11. Dassanayake AS, Kasturiratne A, Rajindrajith S, Kalubowila U, Chakrawarthi S, De Silva AP, et al. Prevalence and risk factors for non-alcoholic fatty liver disease among adults in an urban Sri Lankan population. J Gastroenterol Hepatol. 2009;24(7):1284–8. Epub 2009/05/30. pmid:19476560.
View Article
PubMed/NCBI
Google Scholar

[36] View Article

[37] PubMed/NCBI

[38] Google Scholar

[ref12] 12. WHO. 2022. Available from: https://www.who.int/srilanka/news/detail/11-02-2020-who-country-office-sri-lanka-launches-a-model-health-corner#:~:text=Non%2Dcommunicable%20diseases%20(NCDs),in%20four%20adults%20consume%20tobacco.

[ref13] 13. Chawla N, Bowyer K, Hall LO, Kegelmeyer WP. SMOTE: Synthetic Minority Over-sampling Technique. ArXiv. 2002;abs/1106.1813.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref14] 14. Parmar A, Katariya R, Patel V, editors. A Review on Random Forest: An Ensemble Classifier. International Conference on Intelligent Data Communication Technologies and Internet of Things (ICICI) 2018; 2019 2019//; Cham: Springer International Publishing.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref15] 15. Cichosz P. Assessing the quality of classification models: Performance measures and evaluation procedures. Open Engineering. 2011;1(2):132–58.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref16] 16. Collins D, Lee J, Bobrovitz N, Koshiaris C, Ward A, Heneghan C. whoishRisk ? an R package to calculate WHO/ISH cardiovascular risk scores for all epidemiological subregions of the world [version 2; peer review: 3 approved]. F1000Research. 2017;5(2522). pmid:28357040
View Article
PubMed/NCBI
Google Scholar

[50] View Article

[51] PubMed/NCBI

[52] Google Scholar

[ref17] 17. Collins D. Update of WHOISHRisk function based on WHO 2019 paper #1 2024 [updated 08.03.2020; cited 2024 01.08.2024]. Available from: https://github.com/DylanRJCollins/whoishRisk/pull/1/commits/b59c700d758512464e554cac6650850983efdc48.

[ref18] 18. Mettananda C SM, Haddela PS, Dassanayake AS, Kasturiratne A, Wickramasinghe AR, de Silva HJ. Efficacy of Cardiovascular Disease risk prediction using Machine Learning compared to World Health Organization risk charts, based on data derived from a prospective cohort of Sri Lankans. Journal of the Ceylon College of Physicians. Journal of the Ceylon College of Physicians 2023;1(55).
View Article
Google Scholar

[55] View Article

[56] Google Scholar

[ref19] 19. Mettananda C SM HP, Dassanayake AS, Kasturiratne A, Wickramasinghe AR, de Silva HJ. Efficacy of Cardiovascular Disease risk prediction using Machine Learning compared to World Health Organization risk charts, based on data derived from a prospective cohort of Sri Lankans. BMJopen (being reviewed). 2023.
View Article
Google Scholar

[58] View Article

[59] Google Scholar

[ref20] 20. Zulman DM, Vijan S, Omenn GS, Hayward RA. The relative merits of population-based and targeted prevention strategies. Milbank Q. 2008;86(4):557–80. pmid:19120980; PubMed Central PMCID: PMC2690369.
View Article
PubMed/NCBI
Google Scholar

[61] View Article

[62] PubMed/NCBI

[63] Google Scholar

[ref21] 21. Kuhn M, Johnson K. Measuring Performance in Classification Models. In: Kuhn M, Johnson K, editors. Applied Predictive Modeling. New York, NY: Springer New York; 2013. p. 247–73.

[ref22] 22. Dalal S, Goel P, Onyema EM, Alharbi A, Mahmoud A, Algarni MA, et al. Application of Machine Learning for Cardiovascular Disease Risk Prediction. Computational Intelligence and Neuroscience. 2023;2023:9418666.
View Article
Google Scholar

[66] View Article

[67] Google Scholar

[ref23] 23. Krittanawong C, Virk HUH, Bangalore S, Wang Z, Johnson KW, Pinotti R, et al. Machine learning prediction in cardiovascular diseases: a meta-analysis. Sci Rep. 2020;10(1):16057. Epub 20200929. pmid:32994452; PubMed Central PMCID: PMC7525515.
View Article
PubMed/NCBI
Google Scholar

[69] View Article

[70] PubMed/NCBI

[71] Google Scholar

[ref24] 24. Chiarito M, Luceri L, Oliva A, Stefanini G, Condorelli G. Artificial Intelligence and Cardiovascular Risk Prediction: All That Glitters is not Gold. Eur Cardiol. 2022;17:e29. Epub 20221220. pmid:36845218; PubMed Central PMCID: PMC9947926.
View Article
PubMed/NCBI
Google Scholar

[73] View Article

[74] PubMed/NCBI

[75] Google Scholar

[ref25] 25. Department of Census and Statistics SL. Census of Population and Housing Sri Lanka 2012. 2012.

[ref26] 26. Department of Census and Statistics SL. Census of Population and Housing of Sri Lanka, 2012, Gampaha District. Colombo: Department of Census and Population, 2012.

Development and validation of a cardiovascular risk prediction model for Sri Lankans using machine learning

Development and validation of a cardiovascular risk prediction model for Sri Lankans using machine learning

Correction

Figures

Abstract

Introduction and objectives

Material and methods

Results

Conclusions

Introduction

Materials and methods

Study setting

Study population

Data management, model development and statistical analysis

Ethics approval

Results

Discussion

Acknowledgments

References