Predicting the risk of hypertension using machine learning algorithms: A cross sectional study in Ethiopia

Md. Merajul Islam; Md. Jahangir Alam; Md Maniruzzaman; N. A. M. Faisal Ahmed; Md Sujan Ali; Md. Jahanur Rahman; Dulal Chandra Roy

doi:10.1371/journal.pone.0289613

Abstract

Background and objectives

Hypertension (HTN), a major global health concern, is a leading cause of cardiovascular disease, premature death and disability, worldwide. It is important to develop an automated system to diagnose HTN at an early stage. Therefore, this study devised a machine learning (ML) system for predicting patients with the risk of developing HTN in Ethiopia.

Materials and methods

The HTN data was taken from Ethiopia, which included 612 respondents with 27 factors. We employed Boruta-based feature selection method to identify the important risk factors of HTN. The four well-known models [logistics regression, artificial neural network, random forest, and extreme gradient boosting (XGB)] were developed to predict HTN patients on the training set using the selected risk factors. The performances of the models were evaluated by accuracy, precision, recall, F1-score, and area under the curve (AUC) on the testing set. Additionally, the SHapley Additive exPlanations (SHAP) method is one of the explainable artificial intelligences (XAI) methods, was used to investigate the associated predictive risk factors of HTN.

Results

The overall prevalence of HTN patients is 21.2%. This study showed that XGB-based model was the most appropriate model for predicting patients with the risk of HTN and achieved the accuracy of 88.81%, precision of 89.62%, recall of 97.04%, F1-score of 93.18%, and AUC of 0. 894. The XBG with SHAP analysis reveal that age, weight, fat, income, body mass index, diabetes mulitas, salt, history of HTN, drinking, and smoking were the associated risk factors of developing HTN.

Conclusions

The proposed framework provides an effective tool for accurately predicting individuals in Ethiopia who are at risk for developing HTN at an early stage and may help with early prevention and individualized treatment.

Citation: Islam MM, Alam MJ, Maniruzzaman M, Ahmed NAMF, Ali MS, Rahman MJ, et al. (2023) Predicting the risk of hypertension using machine learning algorithms: A cross sectional study in Ethiopia. PLoS ONE 18(8): e0289613. https://doi.org/10.1371/journal.pone.0289613

Editor: Ali Garavand, Lorestan University of Medical Sciences, ISLAMIC REPUBLIC OF IRAN

Received: April 16, 2023; Accepted: July 22, 2023; Published: August 24, 2023

Copyright: © 2023 Islam et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data used for our paper can be accessed at: https://figshare.com/s/a709a390ecd276046607.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Hypertension (HTN), defined as the elevated blood pressure beyond its normal ranges, is a major public health concern with its raising prevalence and effect among the adults’ overtime worldwide [1–3]. It is one of the most common serious chronic non-communicable diseases. Hypertensive people are affected by different types of cardiovascular diseases (CVDs), e.g., coronary heart disease, stroke, peripheral arterial disease, aortic disease, myocardial infarction [4–7], which are the leading cause of disability, morbidity and mortality that increase the economic burden of out-of-pocket expenditures (OOPE) [8–10]. As reported by World Health Organization (WHO), worldwide around 9.4 million people were died due to HTN every year [10]. According to Belay et al., [2022], globally the prevalence of HTN was 26% in 2000 and it was projected to reach around 1.56 billion (29.2%) by 2025 [11]. The latest estimation by WHO in 2021 revealed that about one-third (31.1%) of the world’s adult population had HTN (1.39 billion); of whom 2/3 were from in low and middle-income countries (LMICs) [12]. Also, a systematic analysis of population-based studies from 90 countries, including Ethiopia estimated that HTN among adults was more prevalent in LMICs (31.5%) than the high-income countries (28.5%) [13]. Different epidemiological studies in Ethiopia reported that the prevalence of HTN was ranging from 7.7%-41.9% [14]. Moreover, the prevalence of HTN is disproportionately more prevalent and it increases alarmingly in poor resource countries, like Ethiopia [11]. But it might be helpful to mitigate and manage/control the risk of HTN if identification of HTN patients with interpretable risk factors at an early stage. Thus, early detection of HTN patients with identification of interpretable risk factors plays a key role, which could help to get the patients timely prevention and intervention. It is therefore highly essential to detect/diagnosis and identify the interpretable risk factors of HTN at an early stage.

Many convincing research and empirical studies determined several risk factors associated with HTN in LMICs countries, including Ethiopia [15–21]. Nevertheless, existing association studies had several limitations. Most importantly, previous existing studies considered traditional linear models, such as logistic regression (LR), Cox proportional hazard model, for identifying the significantly associated risk factors of HTN [22–24]. Moreover, a real data with high-dimensional non-linear pattern presents a challenge to traditional linear models, and low precision of linear models impedes patients-level use. To overcome those limitations with complex real data, machine learning (ML) might be a right choice, which is being widely used in current public health research fields. ML is a subset of artificial intelligence (AI), in which the algorithms that execute the prediction process collect the necessary information from previous experiences and/or detect patterns in data to accomplish a task, typically a classification or identification [25–28]. It can provide several advantages, including automatic specific process, reliable probabilistic estimation for uncovering hidden patterns or relationships with high accuracy while lowering labor costs and time for large amounts of data that aid in decision-making or inference, and model interpretability [29–31]. There are different types of learning algorithm in ML, among them supervised learning is the most popular and widely applicable. The supervised learning algorithm’s goal is to use the dataset to build a model that can predict the system’s output given new inputs. The major two types of supervised learning algorithm are regression and classification. Example of regression include linear regression and logistic regression [32]. Examples of classification include ensemble methods, decision trees (DT), k-nearest neighbors (kNN), support vector machine (SVM), Naïve Bayes (NB), artificial neural network (ANN), so on [32, 33]. The ensemble method is a machine learning technique that combine multiple models with the same learning algorithm to achieve better predictive performance [34]. Ensemble methods include eXtreme gradient boosting (XGB), adaBoost, histogram-based gradient boosting classification Tree, and random forest (RF) [25]. However, previously, some researcher’s conducted their study to develop multivariable prediction models using several ML and explainable artificial intelligence methods [35–37]. Most of the existing risk prediction models were developed with limited number of risk factors that provided less accuracy for predicting HTN patient [35, 38]. However, DT and ensemble approaches have attracted a great attention in recent years for identifying individuals at risk of HTN, there is no evidence that these algorithms are successfully applied in Ethiopian clinical settings.

To the best of our knowledge, this is the first study that applied and builds a predictive model using ML algorithms for predicting the individual risk of HTN in Ethiopia. Thus, the objective of this study was to develop an efficient ensemble based explainable ML framework for predicting patients with the risk of HTN in Ethiopia.

Furthermore, we employed under-sampling and adaptive syntactic (ADASYN) class balancing strategy to enhance the confidence score of the developed prediction models. For model interpretation, we identified the key risk factors of HTN and direction of the relationship between the risk factors and HTN using SHapley Additive exPlanations (SHAP), which is a post hoc model interpretation technique viz. theoretically based on the Shapley value. The overall pipeline of the explainable machine learning based framework is displayed in Fig 1.

Download:

Fig 1. Workflow of the proposed ML-based methodology for predicting risk of HTN.

https://doi.org/10.1371/journal.pone.0289613.g001

The layout of this paper is presented as follows: Materials and methods included data source, statistical analysis, feature selection, machine learning algorithms, performance evaluation criteria, and model interpretability. The results are presented in section 3 and discussed in section 4. Finally, conclusion is represented in section 5.

Materials and methods

Data source

The community-based cross-sectional data used in this investigation were collected in 2017 by the Hawassa city administration and made available to the public by Paulose et al. [39]. The data were collected through multistage random sampling and comprised a total of 633 respondents, ranging in age from 31 to 90, and residing in the city for at least six months. The sample size was determined by using the formula of sample size determination method, which considered the design effect of 1.5, the 95% confidence interval, the 5% margin of error, the 80% power, the proportion of 50% (to maximize sample size), and the 10% non-response rate [39]. Different levels of explanatory variables were included as individual risk factors of HTN and categorized the quantitative variables based on the previous sittings [18–20, 39]. A brief explanation of the included risk factors has been presented in Table 1. In this study, a patient with HTN is determined based on WHO cutoff (≥140/90 mmHg and/or diastolic pressure ≥90 mmHg and/or being on medication of HTN at the time of data collection) [40]. Finally, a total of 612 respondents were incorporated in this study after eliminating all the missing values.

Download:

Table 1. Name, description, and categorization of the selected factors.

https://doi.org/10.1371/journal.pone.0289613.t001

Statistical analysis

The baseline and demographic characteristics of the patients were presented in percentage (%) for categorical and mean ± SD (standard deviation) for continuous data. Pearson χ²-test was employed to determine the association between categorical risk factors and HTN, whereas for continuous risk factors, independent sample t-test was used to examine the mean difference between the HTN groups (HTN vs. non-HTN) for normally distributed data. Two-sided test was performed and a p-value of <0.05 was considered statistically significant for all the tests. Data analysis was performed by SPSS (version-27.0) and R (version-4.2.2).

Feature selection

Feature selection (FS), or risk factor identification is also known as variable selection, or subset selection in statistics and ML. The identification of risk factors is a method for selecting the relevant features by removing the irrelevant or redundant features from the dataset. In this study, Boruta-based feature selection method (FSM) was adopted to identify the relevant features. Boruta is a wrapper-based feature selection method that employs the random forest classifier algorithm. This method has a wider range of applications and performs better than others as it is unbiased and steady [41].

Machine learning algorithms

This study used three different types of supervised ML algorithms for predicting patients with the risk of HTN (Table 2).

Download:

Table 2. Different machine learning algorithms with types.

https://doi.org/10.1371/journal.pone.0289613.t002

Logistic regression

Logistic regression (LR) is a most popular supervised ML-based algorithm that leverages the idea of probability. Logistic regression (LR) is a most popular supervised ML algorithm mainly used for classification task [42]. The LR model employs the logistic function to estimate the probability of the response variable (HTN and non-HTN) in terms of one or more input features. The logistic function can be represented as follows (1) where, p_j denote the probability of HTN and (1−p_j) denote the probability of non-HTN for j^th individual; X_kj is the k^th input feature of the j^th individual and β_k is the k^th regression coefficients.

The above Eq (1) can be expressed as (2) and odds as (3)

If , then we classify as HTN, while , then we classify as non-HTN.

Artificial neural network

Artificial neural network (ANN) is a non-linear modeling algorithm that is inspired by the structure and function of human brain. It consists of interconnected processing nodes that are organized by three different types of layers: input, hidden, and output. The input layer is connected to hidden layer with updated weight, and hidden layer is connected to the output. In this method, X = x₁,…,x_k are used as the input vector in back propagation (BP) algorithm for learning as well as mapping the relationship between input features and outcome variable. The BP algorithm propagates the error between the input risk factors and outcome variable by adjusting weights of hidden layers via backward direction with non-linear sigmoid activation function [43]. The sigmoid activation function is defined as (4)

This procedure is repeated iteratively until no change iteration values or not getting the minimum error.

Random forest

Random forest is a popular machine learning algorithm that developed by Leo Breiman and widely used in classification and regression problems [44]. It is based on the concept of ensemble learning algorithm that trains multiple decision tree on random subsets of the data to solve the problem. The RF-based model is constructed by using the following steps:

Step1: The given training data set (X_ij, i = 1, 2… k, j = 1, 2… n), select randomly risk factors from training dataset by using bootstrap sampling procedure.
Step 2: Built a decision tree (DT) for creating new subset.
Step3: Repeat Step1 and Step2, until construct many trees and consist of a forest.
Steps 4: Consider the prediction result from each created DT and select final prediction with the help of majority voting.

Extreme gradient boosting

Extreme gradient boosting (XGB) is an efficient ensemble-based machine learning algorithm that uses decision trees and gradient boosting algorithm. It is highly adaptable and working in most classification problem, especially HTN disease prediction [45]. Boosting is a learning algorithm, which attempts to create a strong classifier based on weak learners or classifiers. The weak and strong classification models mention to the correlation of predicted and actual class. By adding classifiers on top of each other iteratively, the next classifier can modify the errors of the earlier one. This procedure is repeated until the training data set accurately predicts the membership class label of the target variable.

Data partition and balancing

We randomly divided the whole dataset into two sets as 70% training set [HTN: 91 (21.2%), non-HTN: 338 (78.8)] and 30% testing set HTN: 39 (21.3%), non-HTN: 144 (78.7)] using stratified sampling procedure [46]. Membership class label of the data was imbalance i.e., skewed class distribution of observations. Imbalance class problem of a data provided a biased result for the majority class of the response variable in classification task [47, 48]. To deal this problem, several data balancing strategy are widely applicable. Among them, under-sampling and Adaptive synthetic (ADASYN) balancing strategy were executed in the training set to balance the data. ADASYN is the newly generalized version of synthetic minority oversampling technique (SMOTE) and generates new sample for the minority class using a weighted distribution [49].

Cross validation and tune hyperparameters

The mentioned above four ML algorithms (LR, ANN, RF, and XGB) have other parameters, called hyperparameters. Hyperparameters are those parameters that the user explicitly defines before the learning process to improve the model performance. The grid search method with repeated10-fold (K10) cross-validation protocol was used to tune the hyperparameter values in the training set. The training dataset is divided into a 9:1 ratio as a training subset and a verification set to perform the K10 protocol. The caret package (version 6.0-93) in R was used to generate the optimal hyperparameter values for four models, which are displayed in Table 3.

Download:

Table 3. The value of hyperparameter for ML-based models.

https://doi.org/10.1371/journal.pone.0289613.t003

Performance evaluation criteria

The performance of selected four ML models was evaluated by five popular evaluation criteria: accuracy, precision, recall, F-score, and area under the curve (AUC). The values of performance evaluation criteria were calculated from the confusion matrix by four measures (Table 4):

True positive (t_p): model predicted the disease group as HTN where actual group was HTN,
False positive (f_p): model predicted the disease group as HTN where actual group was non-HTN,
False negative (f_n): model predicted the control group non-HTN where actual class was HTN,
True negative (t_n): model predicted the control group non-HTN where actual group was non-HTN.

Download:

Table 4. Confusion matrix.

https://doi.org/10.1371/journal.pone.0289613.t004

Accuracy.

It is used to assess the overall accuracy for the models. It is defined as the ratio of the sum of true cases (t_p and t_n) against total number of cases. Accuracy is defined mathematically as (5)

Precision.

It is the ratio of t_p cases against the predicted positive (DR) cases. It is also called positive predictive value and used to assess the reliability for predicting the model as positive. Precision is defined mathematically as (6)

Recall.

It is the ratio of t_p cases against the actual positive cases (DRs). Model with high recall indicates low f_n. It’s also called sensitivity or true positive rate (TPR). Recall is defined mathematically as (7)

F1-score.

It is a harmonic mean of precision and recall. F-score is defined mathematically as (8)

Area under the curve

The AUC is defined as an integral of the receiver operating characteristic (ROC) function over the given range and used to assess the quality of the built predictive model. The mathematical formula of AUC is as follows (9)

A ROC curve is a plot of TPR or sensitivity on the y axis against false positive rate (FPR) or 1-specificity on the x axis for different cutoff values. The ROC curve is broadly used in medical diagnosis as another single-number measure for evaluating the predictive validity of ML-based model [50]. ROCs generate an AUC value from 0 to 1.

Model interpretability

Shapley additive explanations (SHAP) is an interpretability visualization approach, which is constructed based on Shapley values. This method was introduced by Lundberg and Lee (2017), and widely used to explain the local and global importance using SHAP value by computing the contribution of each risk factor in the ML-based prediction model [51]. The explanation value of SHAP was initially established from coalitional game theory, where each predictor is used as an individual player in a game or coalition. SHAP values framework offers a fair solution for each player in a model outcome, and provides a series of desirable properties/axioms, including consistency, efficiency, dummy, and additively [52]. The efficiency property of SHAP method provided better reliable results compared to another methods, for example local interpretable model-agnostic explanations [53]. Risk factors contribute to the model’s outcome or prediction with different magnitude and sign, which is accounted for by Shapley values. Accordingly, Shapley values represent estimates of feature importance magnitude of the contribution and its direction (sign). Risk factors with positive SHAP value contribute to predict patent with HTN in the model, whereas risk factors with negative SHAP value contribute to predicting patients with control in the model. Particularly, the importance of each risk factor, say k^th risk factor, is measured by the Shapley value defined by the following formula (10) where, S denotes the subset of risk factors, that does not include the risk factor for which we are calculating the value of ∅_k(v); S∪{k} is the subset of risk factors, that includes in S and the k^th risk factor; v(S) corresponds to the outcome of the ML-based model that explain using the risk factors of S; S⊆M\{k} represents all sets of S that are subsets of the full set of M risk factors, excluding the k^th risk factor.

Results

Baseline characteristics

This study enrolled 612 participants (HTN: 130, 21.2% and non-HTN: 482, 78.8%) with 27 HTN-related predictor variables (Table 5). About 53.4% respondents were male and more than half of the respondents living in urban areas. The average age of the participants was 47.56.20±13.40 years, height 165.20±8.87 cm, and weight 66.589±8.769 kg. Obese respondents showed higher prevalence rate of HTN than normal (50.0% vs. 13.4%). Patients having diabetes (47.5% vs. 30.0%) and smoking (50.4% vs. 23.8%) were more prevalent to HTN. The prevalence of HTN was greater among the respondents who had family history of diabetes (41.8% vs. 11.2%) and HTN (60.3% vs. 21.9%). The result of association showed that residence, sex, age, occupation, income, PA, walking, diabetes, height, weight, BMI, smoking, drinking, vegetable, fat, salt, transport, HD, wealth, HHTN were significantly associated with HTN (P-value<0.005).

Download:

Table 5. Baseline characteristic of the respondents.

https://doi.org/10.1371/journal.pone.0289613.t005

Risk factors selection using Boruta

The result of Boruta based feature selection method is presented in Fig 2. The method showed that age, occupation, PA, walking, diabetes, height, weight, BMI, smoking, drinking, vegetable, fat, transport, HD, wealth, and HHTN were the important risk factors of HTN. The selected risk factors were included to construct the ML-based model for prediction of HTN status (HTN or non-HTN).

Download:

Fig 2. Risk factors selection using Boruta based feature selection method.

https://doi.org/10.1371/journal.pone.0289613.g002

Performance comparisons of ML-based models

The performance of four ML-based models with under-sampling and ADASYN shown in Table 6 and S1 Fig. It is to be noticed that XGB model with ADASYN balancing method achieved the highest predictive discrimination ability with the accuracy of 88.81% (95% CI: 85.44–91.63), precision of 89.62, recall of 97.04, F1-score of 93.18, and AUC of 0.894 (95% CI: 0.827–0.961) compared to others.

Download:

Table 6. Performance of four models with two class balancing methods.

https://doi.org/10.1371/journal.pone.0289613.t006

The corresponding ROC curves and precision recall curves of four predictive models with ADASYN displayed in Fig 3. The ROC curves and precision recall curves also indicated that the XGB model reached significantly better than other models as LR, ANN, and RF. Therefore, in comparison to other models, our results showed that the XGB-based model with ADASYN performed well.

Download:

Fig 3.

(a) ROC curves and (b) Precision vs. recall curves of four predictive models.

https://doi.org/10.1371/journal.pone.0289613.g003

Interpretable risk factors of hypertension

SHAP analysis was executed to determine the interpretable predictive risk factor of HTN for the suited prediction model (XGB) based on the SHAP values. Fig 4(A) explains the global importance of each risk factor of XGB-based model. The importance plots only show the global influence of each feature on the prediction. However, the global importance plot does not indicate which risk factors affect positively (HTN) or negatively (non-HTN) on the prediction. For that reason, summary plots are executed, which provide a global macro-level explanation of how the input risk factors contribute to the prediction. Fig 4(B) represents the summary plot indicating the importance, impact, original value, and correlation of the risk factors to high risk of HTN. Particularly, the effect [positive (HTN) vs. negative (non-HTN)] is shown on the x-axis. The color signifies the value of a specific risk factor, wherein red indicates a high value and blue indicate a low value. However, XGB-based model showed that age, weight, fat, income, BMI, diabetes, salt, HHTN, drinking, and smoking were the high interpretable risk factors on the predication of HTN.

Download:

Fig 4. Importance of risk factors based on SHAP values.

(A) Mean absolute SHAP values, to explain global risk factor importance, (B) Local explanation summary, to reveal the direction of the relationship between a risk factor and game outcome.

https://doi.org/10.1371/journal.pone.0289613.g004

Discussion

In this study, we investigate several ML-based algorithms to propose an explainable framework for predicting the risk of HTN in Ethiopia. We trained up four ML algorithms (ANN, SVM, RF, and XGB) to predict HTN, using 16 risk factors obtained from Boruta feature selection method. The performance of the developed models compared by accuracy, precision, recall, F1-score, and ROC curve with AUC value on testing set. Based on performance measurements, we proposed XGB model as the most appropriate candidate classifier for predicting HTN.

Several studies were conducted using ML framework to predict the risk of HTN. A comparison of the present study with the existing studies is presented in Table 7. Chowdhury et al. [54] proposed a system on 18,322 respondents with 24 candidate risk factors in Canada. Before constructing the models, they applied five top FSM for selecting the significant risk factors and adopted five ML algorithms LASSO, Elastic Net, random survival forest (RSF), and gradient boosting, with the conventional Cox proportional hazard model for predicting HTN. They measure the performance of the models by C-index for each model. Pratiwi OA [35] applied four ML algorithms such as DT, RF, GB, and LR for predicting individual risk of HTN in Indonesia. He developed the model by K10 protocol based on training set and prediction performance of these models was measure on testing set in terms of accuracy, precision, recall, F1-score, and AUC. He indicated LR is the best performer marginally compared to others with AUC (0.829). Oanh and Tung [55] suggested a ML based model to predict patient with the risk of HTN in Vietnam. The model was developed by Naïve Bayes (NB), multilayer perceptron (MLP), decision tree (DT), k-nearest neighbors (kNN), SVM, and ensemble algorithms: bagging (RF), boosting and voting based on training set. The performance of the models was assessed by testing set in terms of F1-score, precision, and recall. Islam et al. [38] conducted a study on three countries such as Bangladesh, Nepal, and India. They included 818603 respondents with seven risk factors and performed GT, RF, GBM, XGB, LR, LDA algorithms for predicting HTN patients. They focused that XGB achieved the best performance score than others. Chai et al. [56] used Malaysian data with 2461 respondents and 11 covariates to develop a system for diagnosing HTN patients by 3 different types of algorithms, including neural network (MLP), classical model (LR, DT, NB, k-NN), and ensemble model (RF, SVM, GB, XGB, LightGBM, CatBoost, AdaBoost, and LogitBoost). Before building the model, they adopted correlation-based FSM to select a set of leading features and utilized SMOTE technique to balance membership class label of the data. They evaluate the predictive ability of the models by sensitivity, specificity, accuracy, precision, F1-score, misclassification rate, and AUC on testing set and found that LightGBM based model acquired the best accuracy with 74.39%. Islam et al. [57] used nationally representative HTN data in Bangladesh. The data consisted of 6965 subjects with 13 risk factors. They determine the prominent risk factors of HTN by two popular FSM such as LASSO and SVMRFE in Bangladesh. They utilized then K10 protocol to construct model using four ML algorithms on training set and measured the performance of the models on testing set using accuracy, precision, recall, F1- score and AUC. Overall experimental sittings demonstrated that gradient boosting model attained the best score of AUC (0.669). Zheng et al. [58] explored a system for predicting HTN patients using several ML techniques in USA. No feature selection method had used to select the prominent features of HTN before constructing ML-based system. They found that ANN model reached the maximum performance score. Alkaabi et al. [59] utilized HTN data in Qatar. The dataset comprised of 987 respondents with 12 risk factors. They adopted 3 ML-based algorithms including DT, RF, and LR. Overall experimental results anticipated that RF model provided better generalization predictive ability than others.

Download:

Table 7. Comparative performance of the proposed study with the existing studies.

https://doi.org/10.1371/journal.pone.0289613.t007

Thus, the comparative results suggested that our proposed XGB framework can predict HTN with higher AUC (Table 7). Moreover, SHAP analysis with the proposed method revealed that age, weight, fat, income, diabetes, BMI, height, salt, smoking, and HHTN were the associated risk factors for developing HTN. Local explanation summary plot showed that age is the 1^st leading risk factor of HTN in Ethiopia. A study conducted by Belay et al., [2022] in Ethiopia found that a patient with age>60 years was two times more likely to have HTN than those with age 18–40 years [11]. This result also supported by several systematic review and meta-analysis studies [60, 61]. The vascular system of our body changes in arteries, particularly with large artery stiffness caused by older age. Weight and fat are the 2^nd and 3^rd leading drivers of HTN. This finding supports the conclusions of earlier investigations [62]. Excess body weight increases visceral and retroperitoneal fat, which can contribute to the development of HTN. Household income is linked to the risk of HTN, which was in line with the prior investigations [63]. Due to a number of reasons, including the ongoing nutritional transition, rising trends in sedentary lifestyle, and other modifiable risk factors, people from low-income families may have a greater burden from the disease [64]. BMI is another gradient of HTN which is corroborated with the earlier studies [65]. BMI might be a cause of HTN and other cardiovascular disease by stimulating the renin-aldosterone system and endothelial dysfunction [66]. Diabetes is another important marker of HTN. The two medical conditions diabetes and HTN may cause each other and share common risk factors. HHTN is another important covariate of HTN. This result is also coincided with the previous studies conducted in Ethiopia and other countries [67]. This might be as family member share same genetic factors, behaviors, mostly similar lifestyle, and environments related factor that could influence the risk of HTN disease. Additionally, other risk factors such as salt, drinking alcohol, and smoking were found to be an important contributing risk factors of HTN, which is similar with other studies in literature [68, 69]. Although this work has many strengths, it also has some limitations, such as the sample only included permanent the residents of the city administration who had lived in the area for more than six months and were older than 30. Additionally, it did not measure the amount of alcohol, cigarettes, fruits, vegetables, fats, and salts that were consumed in measurable units.

Conclusions

In this study, we adopted four different machine learning algorithms to build the most appropriate predictive model for classification of HTN. Overall experimental results anticipated that, among four models, the XGB model is the most appropriate model for predicting patient with the risk of HTN. The SHAP analysis revealed that age, weight, fat, income, BMI, diabetes, salt, HHTN, drinking, and smoking are the high contributing risk factors for developing HTN. Therefore, the proposed integrating system can be conveniently utilized as a useful tool in clinical sittings to accurately identify the patients with the risk of HTN at an early stage. With the help of this information, a doctor can make decisions that will reduce healthcare costs and time while also enabling individualized interventions and targeted treatment to minimize the burden of HTN in Ethiopia.

Supporting information

S1 Fig.

ROC curve of four models with two class balancing methods, (a) under-sampling and (b) ADASYN.

https://doi.org/10.1371/journal.pone.0289613.s001

(DOCX)

Acknowledgments

Authors would like to thanks the PLOS ONE’s editor and reviewers for their valuable comments and suggestions to improve the quality of the manuscript.

References

1. Mills KT, Stefanescu A, He J. The global epidemiology of hypertension. Nature Reviews Nephrology. 2020;16(4):223–37. pmid:32024986
- View Article
- PubMed/NCBI
- Google Scholar
2. GBD 2017 Risk Factor Collaborators. Global, regional, and national comparative risk assessment of 84 behavioural, environmental and occupational, and metabolic risks or clusters of risks for 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet. 2018; 392:1923–94. pmid:30496105
- View Article
- PubMed/NCBI
- Google Scholar
3. GBD 2017 Causes of Death Collaborators. Global, regional, and national age-sex-specific mortality for 282 causes of death in 195 countries and territories, 1980–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet. 2018;392:1736–88. pmid:30496103
- View Article
- PubMed/NCBI
- Google Scholar
4. Gupta R, Xavier D. Hypertension: the most important non communicable disease risk factor in India. Indian heart journal. 2018;70(4):565–72. pmid:30170654
- View Article
- PubMed/NCBI
- Google Scholar
5. Fuchs FD, Whelton PK. High blood pressure and cardiovascular disease. Hypertension. 2020;75(2):285–92. pmid:31865786
- View Article
- PubMed/NCBI
- Google Scholar
6. Roth GA, Mensah GA, Johnson CO, Addolorato G, Ammirati E, Baddour LM, et al. Global burden of cardiovascular diseases and risk factors, 1990–2019: update from the GBD 2019 study. Journal of the American College of Cardiology. 2020;76(25):2982–3021. pmid:33309175
- View Article
- PubMed/NCBI
- Google Scholar
7. Rapsomaniki E, Timmis A, George J, Pujades-Rodriguez M, Shah AD, Denaxas S, et al. Blood pressure and incidence of twelve cardiovascular diseases: lifetime risks, healthy life-years lost, and age-specific associations in 1·25 million people. The Lancet. 2014;383(9932):1899–911.
- View Article
- Google Scholar
8. Sorato MM, Davari M, Kebriaeezadeh A, Sarrafzadegan N, Shibru T. Societal economic burden of hypertension at selected hospitals in southern Ethiopia: a patient-level analysis. BMJ open. 2022;12(4):e056627. pmid:35387822
- View Article
- PubMed/NCBI
- Google Scholar
9. Mehta R, Mantri N, Goel AD, Gupta MK, Joshi NK, Bhardwaj P. Out-of-pocket spending on hypertension and diabetes among patients reporting in a health-care teaching institute of the Western Rajasthan. Journal of Family Medicine and Primary Care. 2022;11(3):1083. pmid:35495832
- View Article
- PubMed/NCBI
- Google Scholar
10. Berek PA, Irawati D, Hamid AY. Hypertension: A global health crisis. Ann Clin Hypertens. 2021;5:8–11.
- View Article
- Google Scholar
11. Belay DG, Fekadu H, Molla MD, Chekol HA, Adugna DG, Melese E, et al. Prevalence and associated factors of hypertension among adult patients attending the outpatient department at the primary hospitals of Wolkait tegedie zone, Northwest Ethiopia. Frontiers in Neurology. 2022;13:943595. pmid:36034276
- View Article
- PubMed/NCBI
- Google Scholar
12. Mamdouh H, Alnakhi WK, Hussain HY, Ibrahim GM, Hussein A, Mahmoud I, et al. Prevalence and associated risk factors of hypertension and pre-hypertension among the adult population: findings from the Dubai Household Survey, 2019. BMC Cardiovascular Disorders. 2022;22(1):18. pmid:35090385
- View Article
- PubMed/NCBI
- Google Scholar
13. Tesfa E, Demeke D. Prevalence of and risk factors for hypertension in Ethiopia: A systematic review and meta‐analysis. Health Science Reports. 2021;4(3):e372. pmid:34589614
- View Article
- PubMed/NCBI
- Google Scholar
14. Anjulo U, Haile D, Wolde A. Prevalence of Hypertension and Its Associated Factors Among Adults in Areka Town, Wolaita Zone, Southern Ethiopia. Integrated Blood Pressure Control. 2021;14:43–54. pmid:33758539
- View Article
- PubMed/NCBI
- Google Scholar
15. Damtie D, Bereket A, Bitew D, Kerisew B. The prevalence of hypertension and associated risk factors among secondary school teachers in Bahir Dar City administration, Northwest Ethiopia. International Journal of Hypertension. 2021;2021:525802. pmid:33953969
- View Article
- PubMed/NCBI
- Google Scholar
16. Asresahegn H, Tadesse F, Beyene E. Prevalence and associated factors of hypertension among adults in Ethiopia: a community based cross-sectional study. BMC research notes. 2017;10:1–8.
- View Article
- Google Scholar
17. Khanam R, Ahmed S, Rahman S, Al Kibria GM, Syed JR, Khan AM, et al. Prevalence and factors associated with hypertension among adults in rural Sylhet district of Bangladesh: a cross-sectional study. BMJ open. 2019;9(10):e026722. pmid:31662350
- View Article
- PubMed/NCBI
- Google Scholar
18. Matsuzaki M, Sherr K, Augusto O, Kawakatsu Y, Ásbjörnsdóttir K, Chale F, et al. The prevalence of hypertension and its distribution by sociodemographic factors in Central Mozambique: a cross sectional study. BMC public health. 2020;20:1–9.
- View Article
- Google Scholar
19. Sharma JR, Mabhida SE, Myers B, Apalata T, Nicol E, Benjeddou M, et al. Prevalence of hypertension and its associated risk factors in a rural black population of Mthatha town, South Africa. International Journal of Environmental Research and Public Health. 2021;18(3):1215. pmid:33572921
- View Article
- PubMed/NCBI
- Google Scholar
20. Manios Y, Androutsos O, Lambrinou CP, Cardon G, Lindstrom J, Annemans L, et al. A school-and community-based intervention to promote healthy lifestyle and prevent type 2 diabetes in vulnerable families across Europe: design and implementation of the Feel4Diabetes-study. Public Health Nutrition. 2018;21(17):3281–90. pmid:30207513
- View Article
- PubMed/NCBI
- Google Scholar
21. Hong K, Yu ES, Chun BC. Risk factors of the progression to hypertension and characteristics of natural history during progression: A national cohort study. Plos one. 2020;15(3):e0230538. pmid:32182265
- View Article
- PubMed/NCBI
- Google Scholar
22. Chowdhury MZ, Naeem I, Quan H, Leung AA, Sikdar KC, O’Beirne M, et al. Prediction of hypertension using traditional regression and machine learning models: A systematic review and meta-analysis. Plos one. 2022;17(4):e0266334. pmid:35390039
- View Article
- PubMed/NCBI
- Google Scholar
23. Chowdhury MZ, Leung AA, Sikdar KC, O’Beirne M, Quan H, Turin TC. Development and validation of a hypertension risk prediction model and construction of a risk score in a Canadian population. Scientific Reports. 2022;12(1):12780. pmid:35896590
- View Article
- PubMed/NCBI
- Google Scholar
24. Ghosh S, Kumar M. Prevalence and associated risk factors of hypertension among persons aged 15–49 in India: a cross-sectional study. BMJ open. 2019;9(12):e029714. pmid:31848161
- View Article
- PubMed/NCBI
- Google Scholar
25. Baştanlar Y, Özuysal M. Introduction to machine learning. miRNomics: MicroRNA biology and computational analysis. Humana Press. 2014:105–28.
26. Ghaderzadeh M, Asadi F, Hosseini A, Bashash D, Abolghasemi H, Roshanpour A. Machine learning in detection and classification of leukemia using smear blood images: a systematic review. Scientific Programming. 2021;2021:1–4.
- View Article
- Google Scholar
27. Ghaderzadeh M, Rebecca FE, Standring A. Comparing performance of different neural networks for early detection of cancer from benign hyperplasia of prostate. Applied Medical Informatics. 2013;33(3):45–54.
- View Article
- Google Scholar
28. Salehnasab C, Hajifathali A, Asadi F, Parkhideh S, Kazemi A, Roshanpoor A, et al. An Intelligent Clinical Decision Support System for Predicting Acute Graft-versus-host Disease (aGvHD) following Allogeneic Hematopoietic Stem Cell Transplantation. Journal of Biomedical Physics & Engineering. 2021;11(3):345. pmid:34189123
- View Article
- PubMed/NCBI
- Google Scholar
29. Kruppa J, Liu Y, Biau G, Kohler M, Koenig IR, Malley JD, et al. Probability estimation with machine learning methods for dichotomous and multicategory outcome: theory. Biometrical Journal. 2014;56(4):534–63. pmid:24478134
- View Article
- PubMed/NCBI
- Google Scholar
30. Garavand A, Salehnasab C, Behmanesh A, Aslani N, Zadeh AH, Ghaderzadeh M. Efficient model for coronary artery disease diagnosis: a comparative study of several machine learning algorithms. Journal of Healthcare Engineering. 2022;2022. pmid:36304749
- View Article
- PubMed/NCBI
- Google Scholar
31. Nadim K, Ragab A, Ouali MS. Data-driven dynamic causality analysis of industrial systems using interpretable machine learning and process mining. Journal of Intelligent Manufacturing. 2023;34(1):57–83.
- View Article
- Google Scholar
32. Géron A. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow. O’Reilly Media, Inc. 2022.
33. Rezaianzadeh A, Dastoorpoor M, Sanaei M, Salehnasab C, Mohammadi MJ, Mousavizadeh A. Predictors of length of stay in the coronary care unit in patient with acute coronary syndrome based on data mining methods. Clinical Epidemiology and Global Health. 2020;8(2):383–8.
- View Article
- Google Scholar
34. Kumar A, Mayank J. Ensemble learning for AI developers. BA press: Berkeley, CA, USA. 2020.
35. Kurniawan R, Utomo B, Siregar KN, Ramli K, Besral B, Suhatril RJ, et al. Hypertension prediction using machine learning algorithm among Indonesian adults. IAES International Journal of Artificial Intelligence. 2023;12(2): 776–84.
- View Article
- Google Scholar
36. Visco V, Izzo C, Mancusi C, Rispoli A, Tedeschi M, Virtuoso N, et al. Artificial Intelligence in Hypertension Management: An Ace up Your Sleeve. Journal of Cardiovascular Development and Disease. 2023;10(2):74. pmid:36826570
- View Article
- PubMed/NCBI
- Google Scholar
37. Alsaleh MM, Allery F, Choi JW, Hama T, McQuillin A, Wu H, et al. Prediction of disease comorbidity using explainable artificial intelligence and machine learning techniques: A systematic review. International Journal of Medical Informatics. 2023;175:105088. pmid:37156169
- View Article
- PubMed/NCBI
- Google Scholar
38. Islam SM, Talukder A, Awal MA, Siddiqui MM, Ahamad MM, Ahammed B, et al. Machine Learning Approaches for Predicting Hypertension and Its Associated Factors Using Population-Level Data from Three South Asian Countries. Frontiers in Cardiovascular Medicine. 2022;9:839379. pmid:35433854
- View Article
- PubMed/NCBI
- Google Scholar
39. Paulose T, Nkosi ZZ, Endriyas M. Prevalence of hypertension and its associated factors in Hawassa city administration, Southern Ethiopia: Community based cross-sectional study. Plos one. 2022;17(3):e0264679. pmid:35231073
- View Article
- PubMed/NCBI
- Google Scholar
40. Park S. Ideal target blood pressure in hypertension. Korean Circulation Journal. 2019;49(11):1002–9. pmid:31646769
- View Article
- PubMed/NCBI
- Google Scholar
41. Pudjihartono N, Fadason T, Kempa-Liehr AW, O’Sullivan JM. A review of feature selection methods for machine learning-based disease risk prediction. Frontiers in Bioinformatics. 2022;2:927312. pmid:36304293
- View Article
- PubMed/NCBI
- Google Scholar
42. Ranganathan P, Pramesh CS, Aggarwal R. Common pitfalls in statistical analysis: logistic regression. Perspectives in clinical research. 2017;8(3):148. pmid:28828311
- View Article
- PubMed/NCBI
- Google Scholar
43. Montesinos López OA, Montesinos López A, Crossa J. Fundamentals of Artificial Neural Networks and Deep Learning. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction. Cham: Springer International Publishing. 2022 (pp. 379–425).
44. Breiman L. Random forests. Machine learning. 2001;45(1):5–32.
- View Article
- Google Scholar
45. Guang P, Huang W, Guo L, Yang X, Huang F, Yang M, et al. Blood-based FTIR-ATR spectroscopy coupled with extreme gradient boosting for the diagnosis of type 2 diabetes: A STARD compliant diagnosis research. Medicine. 2020;99(15). pmid:32282717
- View Article
- PubMed/NCBI
- Google Scholar
46. May RJ, Maier HR, Dandy GC. Data splitting for artificial neural networks using SOM-based stratified sampling. Neural Networks. 2010;23(2):283–94. pmid:19959327
- View Article
- PubMed/NCBI
- Google Scholar
47. Thabtah F.; Hammoud S.; Kamalov F.; Gonsalves A. Data imbalance in classification: Experimental evaluation. Inf. Sci. 2020; 513:429–441.
- View Article
- Google Scholar
48. Buda M.; Maki A.; Mazurowski M.A. A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks. 2018;106:249–259. pmid:30092410
- View Article
- PubMed/NCBI
- Google Scholar
49. He H, Bai Y, Garcia EA, Li S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence). IEEE. 2008 (pp. 1322–1328).
50. Hajian-Tilaki K. Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Caspian journal of internal medicine. 2013;4(2):627. pmid:24009950
- View Article
- PubMed/NCBI
- Google Scholar
51. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Advances in neural information processing systems. 2017;30.
- View Article
- Google Scholar
52. Shapley LS. 17. A value for n-person games. InContributions to the Theory of Games (AM-28). Princeton University Press. 2016: 307–318.
53. Palatnik de Sousa I, Maria Bernardes Rebuzzi Vellasco M, Costa da Silva E. Local interpretable model-agnostic explanations for classification of lymph node metastases. Sensors. 2019;19(13):2969. pmid:31284419
- View Article
- PubMed/NCBI
- Google Scholar
54. Chowdhury MZ, Leung AA, Walker RL, Sikdar KC, O’Beirne M, Quan H, et al. A comparison of machine learning algorithms and traditional regression-based statistical modeling for predicting hypertension incidence in a Canadian population. Scientific Reports. 2023;13(1):1–3.
- View Article
- Google Scholar
55. Oanh TT, Tung NT. Predicting Hypertension Based on Machine Learning Methods: A Case Study in Northwest Vietnam. Mobile Networks and Applications. 2022;27(5):2013–23.
- View Article
- Google Scholar
56. Chai SS, Goh KL, Cheah WL, Chang YH, Ng GW. Hypertension Prediction in Adolescents Using Anthropometric Measurements: Do Machine Learning Models Perform Equally Well? Applied Sciences. 2022;12(3):1600.
- View Article
- Google Scholar
57. Islam MM, Rahman MJ, Roy DC, Tawabunnahar M, Jahan R, Ahmed NF, et al. Machine learning algorithm for characterizing risks of hypertension, at an early stage in Bangladesh. Diabetes & Metabolic Syndrome: Clinical Research & Reviews. 2021;15(3):877–884. pmid:33892404
- View Article
- PubMed/NCBI
- Google Scholar
58. Zheng J, Yu Z. A novel machine learning-based systolic blood pressure predicting model. Journal of Nanomaterials. 2021;2021:1–8.
- View Article
- Google Scholar
59. AlKaabi LA, Ahmed LS, Al Attiyah MF, Abdel-Rahman ME. Predicting hypertension using machine learning: Findings from Qatar Biobank Study. Plos One. 2020;15(10):e0240370. pmid:33064740
- View Article
- PubMed/NCBI
- Google Scholar
60. Legese N, Tadiwos Y. Epidemiology of hypertension in Ethiopia: a systematic review. Integrated blood pressure control. 2020;13:135–43. pmid:33116810
- View Article
- PubMed/NCBI
- Google Scholar
61. Koya SF, Pilakkadavath Z, Chandran P, Wilson T, Kuriakose S, Akbar SK, et al. Hypertension control rate in India: Systematic review and meta-analysis of population-level non-interventional studies, 2001–2022. The Lancet Regional Health-Southeast Asia. 2023;9:100113. pmid:37383035
- View Article
- PubMed/NCBI
- Google Scholar
62. Solomon M, Shiferaw BZ, Tarekegn TT, GebreEyesus FA, Mengist ST, Mammo M, et al. Prevalence and Associated Factors of Hypertension Among Adults in Gurage Zone, Southwest Ethiopia, 2022. SAGE Open Nursing. 2023; 9:2377960823115347 pmid:36761364
- View Article
- PubMed/NCBI
- Google Scholar
63. Qin Z, Li C, Qi S, Zhou H, Wu J, Wang W, et al. Association of socioeconomic status with hypertension prevalence and control in Nanjing: a cross-sectional study. BMC Public Health. 2022;22(1):1–9.
- View Article
- Google Scholar
64. Ranzani OT, Kalra A, Di Girolamo C, Curto A, Valerio F, Halonen JI, et al. Urban-rural differences in hypertension prevalence in low-income and middle-income countries, 1990–2020: A systematic review and meta-analysis. Plos Medicine. 2022;19(8):e1004079. pmid:36007101
- View Article
- PubMed/NCBI
- Google Scholar
65. Hall JE, do Carmo JM, da Silva AA, Wang Z, Hall ME. Obesity, kidney dysfunction and hypertension: mechanistic links. Nature reviews nephrology. 2019;15(6):367–85. pmid:31015582
- View Article
- PubMed/NCBI
- Google Scholar
66. Imai Y. A personal history of research on hypertension from an encounter with hypertension to the development of hypertension practice based on out-of-clinic blood pressure measurements. Hypertension Research. 2022;45(11):1726–42. pmid:36075990
- View Article
- PubMed/NCBI
- Google Scholar
67. Mayl JJ, German CA, Bertoni AG, Upadhya B, Bhave PD, Yeboah J, et al. Association of alcohol intake with hypertension in type 2 diabetes mellitus: The ACCORD Trial. Journal of the American Heart Association. 2020;9(18):e017334. pmid:32900264
- View Article
- PubMed/NCBI
- Google Scholar
68. Nguyen TT, Nguyen MH, Nguyen YH, Nguyen TT, Giap MH, Tran TD, et al. Body mass index, body fat percentage, and visceral fat as mediators in the association between health literacy and hypertension among residents living in rural and suburban areas. Frontiers in Medicine. 2022;9. pmid:36148456
- View Article
- PubMed/NCBI
- Google Scholar
69. Choi JW, Han E, Kim TH. Risk of Hypertension and Type 2 Diabetes in Relation to Changes in Alcohol Consumption: A Nationwide Cohort Study. International Journal of Environmental Research and Public Health. 2022;19(9):4941. pmid:35564335
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Mills KT, Stefanescu A, He J. The global epidemiology of hypertension. Nature Reviews Nephrology. 2020;16(4):223–37. pmid:32024986
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. GBD 2017 Risk Factor Collaborators. Global, regional, and national comparative risk assessment of 84 behavioural, environmental and occupational, and metabolic risks or clusters of risks for 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet. 2018; 392:1923–94. pmid:30496105
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. GBD 2017 Causes of Death Collaborators. Global, regional, and national age-sex-specific mortality for 282 causes of death in 195 countries and territories, 1980–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet. 2018;392:1736–88. pmid:30496103
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Gupta R, Xavier D. Hypertension: the most important non communicable disease risk factor in India. Indian heart journal. 2018;70(4):565–72. pmid:30170654
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref5] 5. Fuchs FD, Whelton PK. High blood pressure and cardiovascular disease. Hypertension. 2020;75(2):285–92. pmid:31865786
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref6] 6. Roth GA, Mensah GA, Johnson CO, Addolorato G, Ammirati E, Baddour LM, et al. Global burden of cardiovascular diseases and risk factors, 1990–2019: update from the GBD 2019 study. Journal of the American College of Cardiology. 2020;76(25):2982–3021. pmid:33309175
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref7] 7. Rapsomaniki E, Timmis A, George J, Pujades-Rodriguez M, Shah AD, Denaxas S, et al. Blood pressure and incidence of twelve cardiovascular diseases: lifetime risks, healthy life-years lost, and age-specific associations in 1·25 million people. The Lancet. 2014;383(9932):1899–911.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref8] 8. Sorato MM, Davari M, Kebriaeezadeh A, Sarrafzadegan N, Shibru T. Societal economic burden of hypertension at selected hospitals in southern Ethiopia: a patient-level analysis. BMJ open. 2022;12(4):e056627. pmid:35387822
View Article
PubMed/NCBI
Google Scholar

[29] View Article

[30] PubMed/NCBI

[31] Google Scholar

[ref9] 9. Mehta R, Mantri N, Goel AD, Gupta MK, Joshi NK, Bhardwaj P. Out-of-pocket spending on hypertension and diabetes among patients reporting in a health-care teaching institute of the Western Rajasthan. Journal of Family Medicine and Primary Care. 2022;11(3):1083. pmid:35495832
View Article
PubMed/NCBI
Google Scholar

[33] View Article

[34] PubMed/NCBI

[35] Google Scholar

[ref10] 10. Berek PA, Irawati D, Hamid AY. Hypertension: A global health crisis. Ann Clin Hypertens. 2021;5:8–11.
View Article
Google Scholar

[37] View Article

[38] Google Scholar

[ref11] 11. Belay DG, Fekadu H, Molla MD, Chekol HA, Adugna DG, Melese E, et al. Prevalence and associated factors of hypertension among adult patients attending the outpatient department at the primary hospitals of Wolkait tegedie zone, Northwest Ethiopia. Frontiers in Neurology. 2022;13:943595. pmid:36034276
View Article
PubMed/NCBI
Google Scholar

[40] View Article

[41] PubMed/NCBI

[42] Google Scholar

[ref12] 12. Mamdouh H, Alnakhi WK, Hussain HY, Ibrahim GM, Hussein A, Mahmoud I, et al. Prevalence and associated risk factors of hypertension and pre-hypertension among the adult population: findings from the Dubai Household Survey, 2019. BMC Cardiovascular Disorders. 2022;22(1):18. pmid:35090385
View Article
PubMed/NCBI
Google Scholar

[44] View Article

[45] PubMed/NCBI

[46] Google Scholar

[ref13] 13. Tesfa E, Demeke D. Prevalence of and risk factors for hypertension in Ethiopia: A systematic review and meta‐analysis. Health Science Reports. 2021;4(3):e372. pmid:34589614
View Article
PubMed/NCBI
Google Scholar

[48] View Article

[49] PubMed/NCBI

[50] Google Scholar

[ref14] 14. Anjulo U, Haile D, Wolde A. Prevalence of Hypertension and Its Associated Factors Among Adults in Areka Town, Wolaita Zone, Southern Ethiopia. Integrated Blood Pressure Control. 2021;14:43–54. pmid:33758539
View Article
PubMed/NCBI
Google Scholar

[52] View Article

[53] PubMed/NCBI

[54] Google Scholar

[ref15] 15. Damtie D, Bereket A, Bitew D, Kerisew B. The prevalence of hypertension and associated risk factors among secondary school teachers in Bahir Dar City administration, Northwest Ethiopia. International Journal of Hypertension. 2021;2021:525802. pmid:33953969
View Article
PubMed/NCBI
Google Scholar

[56] View Article

[57] PubMed/NCBI

[58] Google Scholar

[ref16] 16. Asresahegn H, Tadesse F, Beyene E. Prevalence and associated factors of hypertension among adults in Ethiopia: a community based cross-sectional study. BMC research notes. 2017;10:1–8.
View Article
Google Scholar

[60] View Article

[61] Google Scholar

[ref17] 17. Khanam R, Ahmed S, Rahman S, Al Kibria GM, Syed JR, Khan AM, et al. Prevalence and factors associated with hypertension among adults in rural Sylhet district of Bangladesh: a cross-sectional study. BMJ open. 2019;9(10):e026722. pmid:31662350
View Article
PubMed/NCBI
Google Scholar

[63] View Article

[64] PubMed/NCBI

[65] Google Scholar

[ref18] 18. Matsuzaki M, Sherr K, Augusto O, Kawakatsu Y, Ásbjörnsdóttir K, Chale F, et al. The prevalence of hypertension and its distribution by sociodemographic factors in Central Mozambique: a cross sectional study. BMC public health. 2020;20:1–9.
View Article
Google Scholar

[67] View Article

[68] Google Scholar

[ref19] 19. Sharma JR, Mabhida SE, Myers B, Apalata T, Nicol E, Benjeddou M, et al. Prevalence of hypertension and its associated risk factors in a rural black population of Mthatha town, South Africa. International Journal of Environmental Research and Public Health. 2021;18(3):1215. pmid:33572921
View Article
PubMed/NCBI
Google Scholar

[70] View Article

[71] PubMed/NCBI

[72] Google Scholar

[ref20] 20. Manios Y, Androutsos O, Lambrinou CP, Cardon G, Lindstrom J, Annemans L, et al. A school-and community-based intervention to promote healthy lifestyle and prevent type 2 diabetes in vulnerable families across Europe: design and implementation of the Feel4Diabetes-study. Public Health Nutrition. 2018;21(17):3281–90. pmid:30207513
View Article
PubMed/NCBI
Google Scholar

[74] View Article

[75] PubMed/NCBI

[76] Google Scholar

[ref21] 21. Hong K, Yu ES, Chun BC. Risk factors of the progression to hypertension and characteristics of natural history during progression: A national cohort study. Plos one. 2020;15(3):e0230538. pmid:32182265
View Article
PubMed/NCBI
Google Scholar

[78] View Article

[79] PubMed/NCBI

[80] Google Scholar

[ref22] 22. Chowdhury MZ, Naeem I, Quan H, Leung AA, Sikdar KC, O’Beirne M, et al. Prediction of hypertension using traditional regression and machine learning models: A systematic review and meta-analysis. Plos one. 2022;17(4):e0266334. pmid:35390039
View Article
PubMed/NCBI
Google Scholar

[82] View Article

[83] PubMed/NCBI

[84] Google Scholar

[ref23] 23. Chowdhury MZ, Leung AA, Sikdar KC, O’Beirne M, Quan H, Turin TC. Development and validation of a hypertension risk prediction model and construction of a risk score in a Canadian population. Scientific Reports. 2022;12(1):12780. pmid:35896590
View Article
PubMed/NCBI
Google Scholar

[86] View Article

[87] PubMed/NCBI

[88] Google Scholar

[ref24] 24. Ghosh S, Kumar M. Prevalence and associated risk factors of hypertension among persons aged 15–49 in India: a cross-sectional study. BMJ open. 2019;9(12):e029714. pmid:31848161
View Article
PubMed/NCBI
Google Scholar

[90] View Article

[91] PubMed/NCBI

[92] Google Scholar

[ref25] 25. Baştanlar Y, Özuysal M. Introduction to machine learning. miRNomics: MicroRNA biology and computational analysis. Humana Press. 2014:105–28.

[ref26] 26. Ghaderzadeh M, Asadi F, Hosseini A, Bashash D, Abolghasemi H, Roshanpour A. Machine learning in detection and classification of leukemia using smear blood images: a systematic review. Scientific Programming. 2021;2021:1–4.
View Article
Google Scholar

[95] View Article

[96] Google Scholar

[ref27] 27. Ghaderzadeh M, Rebecca FE, Standring A. Comparing performance of different neural networks for early detection of cancer from benign hyperplasia of prostate. Applied Medical Informatics. 2013;33(3):45–54.
View Article
Google Scholar

[98] View Article

[99] Google Scholar

[ref28] 28. Salehnasab C, Hajifathali A, Asadi F, Parkhideh S, Kazemi A, Roshanpoor A, et al. An Intelligent Clinical Decision Support System for Predicting Acute Graft-versus-host Disease (aGvHD) following Allogeneic Hematopoietic Stem Cell Transplantation. Journal of Biomedical Physics & Engineering. 2021;11(3):345. pmid:34189123
View Article
PubMed/NCBI
Google Scholar

[101] View Article

[102] PubMed/NCBI

[103] Google Scholar

[ref29] 29. Kruppa J, Liu Y, Biau G, Kohler M, Koenig IR, Malley JD, et al. Probability estimation with machine learning methods for dichotomous and multicategory outcome: theory. Biometrical Journal. 2014;56(4):534–63. pmid:24478134
View Article
PubMed/NCBI
Google Scholar

[105] View Article

[106] PubMed/NCBI

[107] Google Scholar

[ref30] 30. Garavand A, Salehnasab C, Behmanesh A, Aslani N, Zadeh AH, Ghaderzadeh M. Efficient model for coronary artery disease diagnosis: a comparative study of several machine learning algorithms. Journal of Healthcare Engineering. 2022;2022. pmid:36304749
View Article
PubMed/NCBI
Google Scholar

[109] View Article

[110] PubMed/NCBI

[111] Google Scholar

[ref31] 31. Nadim K, Ragab A, Ouali MS. Data-driven dynamic causality analysis of industrial systems using interpretable machine learning and process mining. Journal of Intelligent Manufacturing. 2023;34(1):57–83.
View Article
Google Scholar

[113] View Article

[114] Google Scholar

[ref32] 32. Géron A. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow. O’Reilly Media, Inc. 2022.

[ref33] 33. Rezaianzadeh A, Dastoorpoor M, Sanaei M, Salehnasab C, Mohammadi MJ, Mousavizadeh A. Predictors of length of stay in the coronary care unit in patient with acute coronary syndrome based on data mining methods. Clinical Epidemiology and Global Health. 2020;8(2):383–8.
View Article
Google Scholar

[117] View Article

[118] Google Scholar

[ref34] 34. Kumar A, Mayank J. Ensemble learning for AI developers. BA press: Berkeley, CA, USA. 2020.

[ref35] 35. Kurniawan R, Utomo B, Siregar KN, Ramli K, Besral B, Suhatril RJ, et al. Hypertension prediction using machine learning algorithm among Indonesian adults. IAES International Journal of Artificial Intelligence. 2023;12(2): 776–84.
View Article
Google Scholar

[121] View Article

[122] Google Scholar

[ref36] 36. Visco V, Izzo C, Mancusi C, Rispoli A, Tedeschi M, Virtuoso N, et al. Artificial Intelligence in Hypertension Management: An Ace up Your Sleeve. Journal of Cardiovascular Development and Disease. 2023;10(2):74. pmid:36826570
View Article
PubMed/NCBI
Google Scholar

[124] View Article

[125] PubMed/NCBI

[126] Google Scholar

[ref37] 37. Alsaleh MM, Allery F, Choi JW, Hama T, McQuillin A, Wu H, et al. Prediction of disease comorbidity using explainable artificial intelligence and machine learning techniques: A systematic review. International Journal of Medical Informatics. 2023;175:105088. pmid:37156169
View Article
PubMed/NCBI
Google Scholar

[128] View Article

[129] PubMed/NCBI

[130] Google Scholar

[ref38] 38. Islam SM, Talukder A, Awal MA, Siddiqui MM, Ahamad MM, Ahammed B, et al. Machine Learning Approaches for Predicting Hypertension and Its Associated Factors Using Population-Level Data from Three South Asian Countries. Frontiers in Cardiovascular Medicine. 2022;9:839379. pmid:35433854
View Article
PubMed/NCBI
Google Scholar

[132] View Article

[133] PubMed/NCBI

[134] Google Scholar

[ref39] 39. Paulose T, Nkosi ZZ, Endriyas M. Prevalence of hypertension and its associated factors in Hawassa city administration, Southern Ethiopia: Community based cross-sectional study. Plos one. 2022;17(3):e0264679. pmid:35231073
View Article
PubMed/NCBI
Google Scholar

[136] View Article

[137] PubMed/NCBI

[138] Google Scholar

[ref40] 40. Park S. Ideal target blood pressure in hypertension. Korean Circulation Journal. 2019;49(11):1002–9. pmid:31646769
View Article
PubMed/NCBI
Google Scholar

[140] View Article

[141] PubMed/NCBI

[142] Google Scholar

[ref41] 41. Pudjihartono N, Fadason T, Kempa-Liehr AW, O’Sullivan JM. A review of feature selection methods for machine learning-based disease risk prediction. Frontiers in Bioinformatics. 2022;2:927312. pmid:36304293
View Article
PubMed/NCBI
Google Scholar

[144] View Article

[145] PubMed/NCBI

[146] Google Scholar

[ref42] 42. Ranganathan P, Pramesh CS, Aggarwal R. Common pitfalls in statistical analysis: logistic regression. Perspectives in clinical research. 2017;8(3):148. pmid:28828311
View Article
PubMed/NCBI
Google Scholar

[148] View Article

[149] PubMed/NCBI

[150] Google Scholar

[ref43] 43. Montesinos López OA, Montesinos López A, Crossa J. Fundamentals of Artificial Neural Networks and Deep Learning. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction. Cham: Springer International Publishing. 2022 (pp. 379–425).

[ref44] 44. Breiman L. Random forests. Machine learning. 2001;45(1):5–32.
View Article
Google Scholar

[153] View Article

[154] Google Scholar

[ref45] 45. Guang P, Huang W, Guo L, Yang X, Huang F, Yang M, et al. Blood-based FTIR-ATR spectroscopy coupled with extreme gradient boosting for the diagnosis of type 2 diabetes: A STARD compliant diagnosis research. Medicine. 2020;99(15). pmid:32282717
View Article
PubMed/NCBI
Google Scholar

[156] View Article

[157] PubMed/NCBI

[158] Google Scholar

[ref46] 46. May RJ, Maier HR, Dandy GC. Data splitting for artificial neural networks using SOM-based stratified sampling. Neural Networks. 2010;23(2):283–94. pmid:19959327
View Article
PubMed/NCBI
Google Scholar

[160] View Article

[161] PubMed/NCBI

[162] Google Scholar

[ref47] 47. Thabtah F.; Hammoud S.; Kamalov F.; Gonsalves A. Data imbalance in classification: Experimental evaluation. Inf. Sci. 2020; 513:429–441.
View Article
Google Scholar

[164] View Article

[165] Google Scholar

[ref48] 48. Buda M.; Maki A.; Mazurowski M.A. A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks. 2018;106:249–259. pmid:30092410
View Article
PubMed/NCBI
Google Scholar

[167] View Article

[168] PubMed/NCBI

[169] Google Scholar

[ref49] 49. He H, Bai Y, Garcia EA, Li S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence). IEEE. 2008 (pp. 1322–1328).

[ref50] 50. Hajian-Tilaki K. Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Caspian journal of internal medicine. 2013;4(2):627. pmid:24009950
View Article
PubMed/NCBI
Google Scholar

[172] View Article

[173] PubMed/NCBI

[174] Google Scholar

[ref51] 51. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Advances in neural information processing systems. 2017;30.
View Article
Google Scholar

[176] View Article

[177] Google Scholar

[ref52] 52. Shapley LS. 17. A value for n-person games. InContributions to the Theory of Games (AM-28). Princeton University Press. 2016: 307–318.

[ref53] 53. Palatnik de Sousa I, Maria Bernardes Rebuzzi Vellasco M, Costa da Silva E. Local interpretable model-agnostic explanations for classification of lymph node metastases. Sensors. 2019;19(13):2969. pmid:31284419
View Article
PubMed/NCBI
Google Scholar

[180] View Article

[181] PubMed/NCBI

[182] Google Scholar

[ref54] 54. Chowdhury MZ, Leung AA, Walker RL, Sikdar KC, O’Beirne M, Quan H, et al. A comparison of machine learning algorithms and traditional regression-based statistical modeling for predicting hypertension incidence in a Canadian population. Scientific Reports. 2023;13(1):1–3.
View Article
Google Scholar

[184] View Article

[185] Google Scholar

[ref55] 55. Oanh TT, Tung NT. Predicting Hypertension Based on Machine Learning Methods: A Case Study in Northwest Vietnam. Mobile Networks and Applications. 2022;27(5):2013–23.
View Article
Google Scholar

[187] View Article

[188] Google Scholar

[ref56] 56. Chai SS, Goh KL, Cheah WL, Chang YH, Ng GW. Hypertension Prediction in Adolescents Using Anthropometric Measurements: Do Machine Learning Models Perform Equally Well? Applied Sciences. 2022;12(3):1600.
View Article
Google Scholar

[190] View Article

[191] Google Scholar

[ref57] 57. Islam MM, Rahman MJ, Roy DC, Tawabunnahar M, Jahan R, Ahmed NF, et al. Machine learning algorithm for characterizing risks of hypertension, at an early stage in Bangladesh. Diabetes & Metabolic Syndrome: Clinical Research & Reviews. 2021;15(3):877–884. pmid:33892404
View Article
PubMed/NCBI
Google Scholar

[193] View Article

[194] PubMed/NCBI

[195] Google Scholar

[ref58] 58. Zheng J, Yu Z. A novel machine learning-based systolic blood pressure predicting model. Journal of Nanomaterials. 2021;2021:1–8.
View Article
Google Scholar

[197] View Article

[198] Google Scholar

[ref59] 59. AlKaabi LA, Ahmed LS, Al Attiyah MF, Abdel-Rahman ME. Predicting hypertension using machine learning: Findings from Qatar Biobank Study. Plos One. 2020;15(10):e0240370. pmid:33064740
View Article
PubMed/NCBI
Google Scholar

[200] View Article

[201] PubMed/NCBI

[202] Google Scholar

[ref60] 60. Legese N, Tadiwos Y. Epidemiology of hypertension in Ethiopia: a systematic review. Integrated blood pressure control. 2020;13:135–43. pmid:33116810
View Article
PubMed/NCBI
Google Scholar

[204] View Article

[205] PubMed/NCBI

[206] Google Scholar

[ref61] 61. Koya SF, Pilakkadavath Z, Chandran P, Wilson T, Kuriakose S, Akbar SK, et al. Hypertension control rate in India: Systematic review and meta-analysis of population-level non-interventional studies, 2001–2022. The Lancet Regional Health-Southeast Asia. 2023;9:100113. pmid:37383035
View Article
PubMed/NCBI
Google Scholar

[208] View Article

[209] PubMed/NCBI

[210] Google Scholar

[ref62] 62. Solomon M, Shiferaw BZ, Tarekegn TT, GebreEyesus FA, Mengist ST, Mammo M, et al. Prevalence and Associated Factors of Hypertension Among Adults in Gurage Zone, Southwest Ethiopia, 2022. SAGE Open Nursing. 2023; 9:2377960823115347 pmid:36761364
View Article
PubMed/NCBI
Google Scholar

[212] View Article

[213] PubMed/NCBI

[214] Google Scholar

[ref63] 63. Qin Z, Li C, Qi S, Zhou H, Wu J, Wang W, et al. Association of socioeconomic status with hypertension prevalence and control in Nanjing: a cross-sectional study. BMC Public Health. 2022;22(1):1–9.
View Article
Google Scholar

[216] View Article

[217] Google Scholar

[ref64] 64. Ranzani OT, Kalra A, Di Girolamo C, Curto A, Valerio F, Halonen JI, et al. Urban-rural differences in hypertension prevalence in low-income and middle-income countries, 1990–2020: A systematic review and meta-analysis. Plos Medicine. 2022;19(8):e1004079. pmid:36007101
View Article
PubMed/NCBI
Google Scholar

[219] View Article

[220] PubMed/NCBI

[221] Google Scholar

[ref65] 65. Hall JE, do Carmo JM, da Silva AA, Wang Z, Hall ME. Obesity, kidney dysfunction and hypertension: mechanistic links. Nature reviews nephrology. 2019;15(6):367–85. pmid:31015582
View Article
PubMed/NCBI
Google Scholar

[223] View Article

[224] PubMed/NCBI

[225] Google Scholar

[ref66] 66. Imai Y. A personal history of research on hypertension from an encounter with hypertension to the development of hypertension practice based on out-of-clinic blood pressure measurements. Hypertension Research. 2022;45(11):1726–42. pmid:36075990
View Article
PubMed/NCBI
Google Scholar

[227] View Article

[228] PubMed/NCBI

[229] Google Scholar

[ref67] 67. Mayl JJ, German CA, Bertoni AG, Upadhya B, Bhave PD, Yeboah J, et al. Association of alcohol intake with hypertension in type 2 diabetes mellitus: The ACCORD Trial. Journal of the American Heart Association. 2020;9(18):e017334. pmid:32900264
View Article
PubMed/NCBI
Google Scholar

[231] View Article

[232] PubMed/NCBI

[233] Google Scholar

[ref68] 68. Nguyen TT, Nguyen MH, Nguyen YH, Nguyen TT, Giap MH, Tran TD, et al. Body mass index, body fat percentage, and visceral fat as mediators in the association between health literacy and hypertension among residents living in rural and suburban areas. Frontiers in Medicine. 2022;9. pmid:36148456
View Article
PubMed/NCBI
Google Scholar

[235] View Article

[236] PubMed/NCBI

[237] Google Scholar

[ref69] 69. Choi JW, Han E, Kim TH. Risk of Hypertension and Type 2 Diabetes in Relation to Changes in Alcohol Consumption: A Nationwide Cohort Study. International Journal of Environmental Research and Public Health. 2022;19(9):4941. pmid:35564335
View Article
PubMed/NCBI
Google Scholar

[239] View Article

[240] PubMed/NCBI

[241] Google Scholar

Figures

Abstract

Background and objectives

Materials and methods

Results

Conclusions

Introduction

Materials and methods

Data source

Statistical analysis

Feature selection

Machine learning algorithms

Logistic regression

Artificial neural network

Random forest

Extreme gradient boosting

Data partition and balancing

Cross validation and tune hyperparameters

Performance evaluation criteria

Accuracy.

Precision.

Recall.

F1-score.

Area under the curve

Model interpretability

Results

Baseline characteristics

Risk factors selection using Boruta

Performance comparisons of ML-based models

Interpretable risk factors of hypertension

Discussion

Conclusions

Supporting information

S1 Fig.

Acknowledgments

References