A Genetic algorithm aided hyper parameter optimization based ensemble model for respiratory disease prediction with Explainable AI

Balraj Preet Kaur; Harpreet Singh; Rahul Hans; Sanjeev Kumar Sharma; Chetna Sharma; Md. Mehedi Hassan

doi:10.1371/journal.pone.0308015

Abstract

In the current era, a lot of research is being done in the domain of disease diagnosis using machine learning. In recent times, one of the deadliest respiratory diseases, COVID-19, which causes serious damage to the lungs has claimed a lot of lives globally. Machine learning-based systems can assist clinicians in the early diagnosis of the disease, which can reduce the deadly effects of the disease. For the successful deployment of these machine learning-based systems, hyperparameter-based optimization and feature selection are important issues. Motivated by the above, in this proposal, we design an improved model to predict the existence of respiratory disease among patients by incorporating hyperparameter optimization and feature selection. To optimize the parameters of the machine learning algorithms, hyperparameter optimization with a genetic algorithm is proposed and to reduce the size of the feature set, feature selection is performed using binary grey wolf optimization algorithm. Moreover, to enhance the efficacy of the predictions made by hyperparameter-optimized machine learning models, an ensemble model is proposed using a stacking classifier. Also, explainable AI was incorporated to define the feature importance by making use of Shapely adaptive explanations (SHAP) values. For the experimentation, the publicly accessible Mexico clinical dataset of COVID-19 was used. The results obtained show that the proposed model has superior prediction accuracy in comparison to its counterparts. Moreover, among all the hyperparameter-optimized algorithms, adaboost algorithm outperformed all the other hyperparameter-optimized algorithms. The various performance assessment metrics, including accuracy, precision, recall, AUC, and F1-score, were used to assess the results.

Citation: Kaur BP, Singh H, Hans R, Sharma SK, Sharma C, Hassan MM (2024) A Genetic algorithm aided hyper parameter optimization based ensemble model for respiratory disease prediction with Explainable AI. PLoS ONE 19(12): e0308015. https://doi.org/10.1371/journal.pone.0308015

Editor: Ren Qi, University of Electronic Science and Technology of China, CHINA

Received: April 3, 2024; Accepted: July 16, 2024; Published: December 2, 2024

Copyright: © 2024 Kaur et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Data are available at: www.kaggle.com/marianarfranklin/mexico-covid19-clinical-data/.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors declare no conflict of interest.

1. Background and rationale

Respiratory diseases are one of the main causes of mortality worldwide. Recently, one of the major respiratory diseases known as COVID-19, which has claimed a lot of lives globally, is one of the most disastrous pandemics seen by the human race in this century. The global lockdown and social distancing were new notions for the global population, living with these constraints was one of the most challenging tasks that changed the lifestyle of the population all around the world. The symptoms of this disease may vary from person to person depending upon the immunity of the body. The virus causing this deadly disease spreads to the respiratory tract of a person and causes grave damage to the lungs, which further leads to serious breathing problems which further reduces the oxygen level and requires further ventilator support for survival. COVID 19 alone has claimed over 6.5 million lives worldwide and diseases like chronic obstructive pulmonary disease (COPD) claim millions of lives every year worldwide [1]. Fig 1 shows the mortality in US alone in the year 2020 due to respiratory diseases.

Download:

Fig 1. Mortality in U.S. in the year 2020 due to respiratory diseases.

https://doi.org/10.1371/journal.pone.0308015.g001

While the World was battling with COVID-19, every possible avenue was explored to find the solution to this deadly disease [2]. To reduce the deadly effects of the disease, swift diagnosis of the disease is one of the most important factors to reduce the mortality rate due to this pandemic. Various diagnosis mechanisms, including real-time reverse transcriptase-polymerase chain reaction (RT-PCR), were taken into consideration for the diagnosis of the disease which appeared to be time time-consuming process [3]. Furthermore, different medical imaging systems, such as computed tomography (CT) and X-ray, can aid in swift diagnosis of the disease also new possibilities including artificial intelligence and big data can have been explored to control the spread of the pandemic [4].

In recent times, machine learning-based computer-aided diagnosis systems have come up as one of the most significant domains of research that assist radiologists in the accurate diagnosis of disease using medical images.

The ability of machine learning to adapt and learn from new data has enabled researchers to continuously refine strategies for managing and mitigating the impact of the disease, showcasing the potential of technology in addressing global health challenges [5]. Through the analysis of vast datasets, machine learning algorithms can be employed to predict the disease. Machine learning models have also been instrumental in developing diagnostic tools, such as predictive models for early detection of COVID-19 based on symptoms or imaging data [6].

In this context, this study aspires to utilize machine learning approaches and other clinical variables in the patient’s data for the development of a predictive model that can identify individuals with the existence of respiratory disease at an early stage and distinguish them from those who are healthy. Therefore, the main objective of this research is to assess and compare the outcome of the proposed model employing various hyperparameter tuning techniques with other state-of-the-art machine learning models.

1.1 Motivation

In recent times, machine learning has come up as one of the most promising domains of research proving its capability in the development of CAD systems for the diagnosis of various diseases [7]. However, for the impeccable deployment of these models, there is a huge room for improvement in various aspects viz. parameter tuning; for selecting the optimal parameters of the model that can lead to better results, and for feature selection; with an aim to reduce the dimensionality of dataset. This research considers the problem of respiratory disease classification and with an aim to improve the performance of existing machine learning algorithms, this research considers tuning the parameters of the algorithms and feature selection.

1.2. Problem identification

In this research, the authors aim to get into the bottom of two different problems required for the successful deployment of machine learning based systems which are parameter tuning and feature selection. Each machine learning algorithm has a different number and types of operations involving the use of different parameters [8]. For the successful deployment of these algorithms, the values of these parameters must be tuned to get the optimal set of values with an aim to achieve better classification accuracy. The problem of parameter tuning is regarded as an optimization problem that tries to optimize the various parameters to get the best set of parameters that aid in getting better accuracy [9]. The second problem considered by the researcher is feature selection, in which the authors try to reduce the dimensionality of the dataset by considering the most pertinent features and removing all the redundant features with an aim to increase the classification accuracy. Feature selection is also regarded as a multi-objective optimization problem that involves two different objectives viz. maximizing classification accuracy and minimizing the number of features [10].

Both the problems are complex optimization problems with different natures and require to be addressed differently to find the best solutions within a bearable time frame. Keeping in mind these goals an integrated system is required that simultaneously addresses all these issues and gives better classification accuracy.

1.3 Challenges and limitation of existing machine learning approaches in disease diagnosis

Machine learning (ML) has shown significant promise in disease diagnosis by automating and enhancing various aspects of the diagnostic process. However, there are several challenges and limitations that currently affect the efficacy and reliability of these approaches.

Data quantity and quality.—High-quality, labeled medical data is scarce due to privacy constraints and the difficulty of obtaining sufficient cases for rare diseases, leading to data imbalances that bias machine learning models. Additionally, errors and inconsistencies in medical data from manual entry and diagnostic inaccuracies introduce noise, impairing model performance [11].
Model related challenges.—Machine learning models in healthcare often overfit to training data, leading to poor generalization to new patients and variability across different settings, limiting their robustness. Many models, especially deep learning ones, are "black boxes," making their decisions difficult to interpret, which hampers clinical trust. Additionally, these models can perpetuate biases from training data, resulting in unfair treatment across diverse patient groups and raising ethical concerns about equity and fairness in medical outcomes [12].
Implementation Challenges.—Integrating machine learning models into clinical workflows presents challenges, including the need for significant changes in how clinicians operate and manage data, alongside potential resistance from users due to trust issues and concerns about job security. ML models require extensive validation in clinical settings, a process that is costly and time-consuming, compounded by complex regulatory requirements [13].
Other Challenges.—Training and deploying machine learning models, especially deep learning ones, demands significant computational resources, which can be a constraint for many healthcare facilities, particularly when real-time processing is required. Despite advancements, manual feature engineering remains essential for capturing domain-specific knowledge, a process that is both labor-intensive and dependent on expertise. Selecting relevant features from complex medical data is also crucial but challenging for model performance.

This study integrates advanced machine learning techniques with a framework based on SHapley Additive ExPlanations (SHAP) to address the limitations mentioned earlier, significantly enhancing the accuracy of COVID-19 diagnostic predictions. Genetic Algorithms (GAs) are employed for hyperparameter optimization due to their efficiency and effectiveness in locating optimal solutions [14]. By simulating the principles of natural selection, GAs thoroughly explore the hyperparameter search space, which helps in developing superior machine learning models with a high likelihood of reaching the global minimum and avoiding local minima [15]. For feature selection, the binary grey wolf algorithm is used, drawing inspiration from the behavior of grey wolves during round-up and hunting. The algorithm incorporates four types of grey wolves—alpha, beta, delta, and omega—to emulate the leadership hierarchy [16]. The optimization process includes the three main steps of hunting. searching for prey, encircling prey, and attacking prey, which are applied to enhance the model’s performance.

1.4 Research contributions

The contribution of the proposed research is fourfold; which has been summarized in the points below.

Firstly, the algorithms’ hyperparameter tuning is proposed to enhance the efficacy of the machine learning classification algorithms.
Secondly, an ensemble learning model is developed considering the performance of the various parameter-tuned classification algorithms.
Thirdly, to select the most relevant feature nature-inspired metaheuristic algorithm is considered for reducing the dimensionality of the dataset to increase the classification accuracy in a bearable time.
Lastly, to comprehensively analyze the observations’ prediction outcomes and interpret the justification behind the model’s classification decisions, SHAP analysis is performed.

1.5 Structuring of the paper

The rest of the article is structured as follows. Section 2 presents a concise overview of state-of-the-art research in the domain of disease detection using machine learning. The proposed model is presented in section 3. Section 4 describes the hyperparameter tuning with the Genetic algorithm. Section 5 briefly describes the dataset considered in this research. Section 6 presents experimental results and discussions. Feature importance using Explainable AI (SHAP Analysis) is discussed in section 7. Section 8 briefly presents the conclusions and future work.

2. Literature survey

This section summarizes the applications of machine learning in the domain of disease diagnosis, more specifically the diagnosis of COVID-19. Alali et al. [17] developed a highly efficient GPR-driven model to forecast the number of COVID-19 cases. The authors employed Bayesian optimization to fine-tune the hyperparameters of the Gaussian process regression in their model. Yank et al. [18] focused on enhancing the hyperparameters of well-known machine learning algorithms. Kumar et al. [19] presented an enhanced machine learning paradigm for the early detection of this illness. Modern Harris hawks optimization (HHO) algorithms based on random forest (HHORF), light gradient boosting (HHOLGB), extreme gradient boosting (HHOXGB), categorical boosting (HHOCAT) and support vector classifier (HHOSVC) were used to maximize the hyperparameters of the machine learning algorithms.

Mohsen et al. [20] used the generalized weighted ensemble with internally tuned hyperparameters (GEMITH) as a nested optimization-based technique that considers the tuning of hyperparameters and determining optimal weights for combining ensembles. Moreover, a heuristic approach was utilized to generate diverse and effective base learners, while Bayesian search was employed to expedite the optimization procedure.

Mohana et al. [21] used deep learning techniques on 350 images from X-ray datasets, the histogram equalization method was used for image preprocessing, and convolution neural network designs like ResNet-50 and VGG-16 were used for image categorization. The results indicated that, VGG-16 results in greater test and train precision. Further, to improve the results, hyper parameter optimization was used to fine-tune the VGG-16’s precision.

Soufiane et al. [22] presented the effectiveness of five different machine learning algorithms, namely Random Forest, Ada Boost, XGBoost, SVM and Decision Tree. For training and evaluation in the first experiment, each model used default parameters. In the second trial, the author’s employ the Grid Search function to identify the model’s ideal setup on a collection of anonymous individuals with or without COVID-19 illness. Aljouie et al. [23] employed four widely used machine learning methods, along with three data balancing approaches and feature selection techniques. Mohammad et al. [24] used a variety of machine learning techniques to predict the mortality rate among COVID-19 patients.

Many researchers have considered the use of feature selection techniques [25] for the diagnosis of the disease, which have been summarized in this section. Mehrdad et al. [26] introduced a new method for diagnosing COVID-19 that combines feature selection with random forest. The proposed method enhances the feature space, simplifies complexity, and provides clinicians with a decision tree-like analysis, facilitating easier explanation. Experimental results demonstrated that the developed prediction model surpassed existing methods and baseline algorithms in terms of performance. Fatih et al. [27] presented a novel approach for detecting COVID-19 automatically, employing a combination of fused dynamic exemplar pyramid feature extraction and hybrid feature selection techniques using deep learning. Extensive testing on various datasets demonstrated the method’s ability to achieve a high level of accuracy in detecting COVID-19. Chattopadhyay et al. [28] created various methods for COVID-19 detection, but only a few of them produced acceptable findings. The study makes two contributions, i.e., extracting deep features from the image dataset before introducing a totally new feature selection method called Clustering-based Golden Ratio Optimizer (CGRO).

Kenway et al. [29] suggested a framework which was divided into three stages that are linked together. Initially, features are extracted from CT images using the Convolutional Neural Network (CNN) known as AlexNet. Next, a feature selection method called Guided Whale Optimization (Guided WOA) is employed, which is based on Stochastic Fractal Search (SFS). Pramanik et al. [30] proposed a computer-aided diagnosis (CAD) system for detecting Pneumonia from chest X-rays, employing deep learning and a metaheuristic algorithm. The approach involved extracting deep features from a pre-trained ResNet50 model, which is fine-tuned on a specific Pneumonia dataset. The proposed method is evaluated using well-known UCI datasets, gene expression datasets based on microarray analysis, and a dataset for predicting COVID-19. Yagin et al. [31] discusses a study that utilizes machine learning techniques, specifically the XGBoost algorithm, to classify and assess COVID-19 patients based on genomic biomarkers. The model aims to provide a clear interpretation of individualized and overall risk estimation for COVID-19, aiding physicians in understanding the impact of key genomic features. The study highlights the importance of external validation, integration of clinical risk factors, and the need for multi-center trials to enhance the predictive accuracy of the model. Additionally, the use of Local Interpretable Model-Agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP) frameworks improves the accuracy of COVID-19 diagnostic prediction and aids in explaining predictions to clinicians. Hamal et al. [32] presents a study on using machine learning models to classify COVID-19-associated lung changes from X-ray images. The research evaluates various models and identifies VGG-19 with data augmentation as the top performer, achieving high precision, recall, and F1 scores for COVID-19, pneumonia, and healthy individuals. The study emphasizes the importance of image pre-processing, tuning, and augmentation in enhancing model performance. Héberger et al. [33] addresses common errors in statistical modeling and focuses on the significance of using performance parameters correctly. It highlights the importance of distinguishing between linear and nonlinear models in modeling processes. The study involves a multicriteria decision-making process to compare various modeling equations and optimization algorithms. It emphasizes the role of variance analysis in detecting outliers and underscores the necessity of data preprocessing. Table 1 presents the comparison of the prominent techniques in the literature.

Download:

Table 1. Comparison of key techniques in their literature.

https://doi.org/10.1371/journal.pone.0308015.t001

3. Proposed model

Machine learning has become one of the most significant domains of research these days and has its applications in various domains. For the successful deployments of machine learning models, certain unaddressed issues that can be considered for improvement as mentioned in the problem identification section.

In this light, authors in this research, aspire to use hyperparameters tuning and feature selection for machine learning algorithms to enhance the efficacy of the models. Fig 2 presents the primary steps in developing the proposed model.

Download:

Fig 2. Proposed methodology.

https://doi.org/10.1371/journal.pone.0308015.g002

Step I- Preprocessing

This step aims to balance the dataset using upsampling techniques, as the COVID-negative cases constitute only 10.5% of the entire dataset, whereas positive samples make up 89.5% (refer to section 5). Following this, certain attributes are subsequently removed from metadata that is not related to the study goal, such as id, ID_Registro, Pecho_Acc, ABR_INT, Fecha_actulization, Ingreso, Fecha_DEF, Pias_origen and naciolandad, etc. Additionally, RESULTADO is taken into account to be a dataset class that contains COVID yes COVID no labels.

As shown in Fig 3, before upsampling, the samples of the COVID-positive class accounted for 90 percent of the total, but after applying upsampling as shown in Fig 4, both the "yes" and "no" classes now possess an equal number of samples. Subsequent analyses and outcomes are based on this balanced dataset.

Download:

Fig 3. Before upsampling.

https://doi.org/10.1371/journal.pone.0308015.g003

Download:

Fig 4. After upsampling.

https://doi.org/10.1371/journal.pone.0308015.g004

Original and after upsampling, a graph illustrating in Fig 5 the count of COVID-positive (Class 1) and COVID-negative (Class 0) cases reveals a balanced dataset. This balance is critical as it ensures equal representation of both classes, thereby enhancing the performance and reliability of the machine learning models. The graph underscores the effectiveness of upsampling in addressing class imbalance, a key factor in improving predictive accuracy and reducing model bias. The Mexico dataset was selected due to its extensive records on respiratory diseases and symptoms, which are highly correlated with COVID-19, providing a robust basis for analysis and comparison.

Download:

Fig 5. Comparison between original and upsampled dataset.

https://doi.org/10.1371/journal.pone.0308015.g005

Step II- Data splitting

For the training and assessment processes, the dataset ratios are 70% and 30%. Two tests were run. In the first, we used the models’ preset hyperparameters for training and testing. The confusion matrix was then created after we had computed the success measures. Secondly, classification results were taken with hyperparameter tuning.

Step III- Classification algorithm

The presented system used seven classifiers. Adaboost, Random forest, Extra tree, Decision Tree, Gradient Boosting Classifier, KNN and Light Gradient Boosting Machine.

Step IV- Hyperparameter tuning

Numerous machine learning applications in the actual world heavily rely on hyperparameter optimization. The hyperparameters of these algorithms can be optimized to boost the efficiency of these algorithms. Genetic algorithms, random search, Bayesian Optimization, and grid search are used as optimization methods. The various hyperparameters used by various classifiers are.

LightGBM. num leaves, bagging fraction, feature fraction, learning rate, max depth, subsample, colsample tree, max bin, min child samples.

Adaboost. (subsample, colsample tree, gamma, max depth, min child weight, learning rate, alpha).

Random Forest. (n estimator, criterion, max depth, min sample split).

The efficiency of the categorization can be improved by carefully choosing (tuning) the values of the hyperparameters. When an optimization algorithm is present, the tuning process can be completed, and the full process is referred to as an optimization issue.

Step V- Building and model analysis

The performance was evaluated by considering the confusion matrix with several metrics, including precision, accuracy, area under the curve, error rate, balanced accuracy score, cross-validation score, Kappa index, and F1-score. The 2X2 CM has been used in the suggested model hyperparameter optimization-based ML method to assess the model using the mentioned metrics. The results that were properly categorized are represented by the categorization along the main diagonal. (Higher numbers of the metrics, excluding the error stated above, indicate a more effective model.

Step VI- Feature selection

The proposed research uses one of the latest nature-inspired metaheuristic algorithms, "grey-wolf optimizer" that imitates the natural command structure and foraging strategy of grey wolves [37] for feature selection. The algorithm searches the space of features to find the best features from the original set of features with an aim to maximize the accuracy of prediction and minimize the number of features selected. Fig 6 presents the feature selection process considered in this research. Using feature selection, one can determine the crucial features and eliminate the unnecessary (redundant) ones from the dataset [38]. For various machine learning applications, the feature selection goals include reducing data dimensionality, enhancing prediction performance, and providing good data understanding [39].

Download:

Fig 6. Feature selection process.

https://doi.org/10.1371/journal.pone.0308015.g006

Step VII- Ensemble model

In the proposed model, stacking method [40] is used as ensemble learning. The strategy of this method is employed to enhance the predictive efficacy of machine learning models. This approach entails amalgamating several foundational models to construct a more robust meta-model that capitalizes on the distinct capabilities of each foundational model.

The fundamental concept behind stacking revolves around incorporating one or more meta-level models, which accept predictions from multiple foundational models as inputs and subsequently generate the ultimate prediction. The greatest precision is achieved when Adaboost, KNN, and Random forest are combined as shown in Fig 7. The mathematical formula [41, 42] is demonstrated as Eq 1 (1) y is the stacking classifier for getting the result by adding three machine learning with the best result.

Download:

Fig 7. Ensemble model architecture.

https://doi.org/10.1371/journal.pone.0308015.g007

Step VIII- Performance evaluation

A performance assessment model enables precision and efficiency evaluations. There are numerous methods for rating classifiers. In this study, we utilized the Holdout technique, which involves partitioning the dataset into two separate subsets. a test set and a train set, with each comprising 30% and 70% of the dataset, respectively. The training process involved using the train set to train the data, and afterward, we assessed its predictive capabilities by evaluating it on the hidden test set [43]. To mitigate overfitting in the proposed model, a feature selection process was employed to eliminate noise and remove features that were either redundant or of minimal importance for prediction accuracy. Additionally, an ensemble modeling approach was adopted to further reduce overfitting. Ensemble methods enhance model performance by combining multiple weak learners, which collectively produce more accurate and robust results. By leveraging multiple models to analyze the data, ensemble techniques ensure that the final predictions are more reliable and precise. Additionally, we employed the Cross-validation technique to prevent the over-fitting issue. Then, we determined some assessment metrics, including the F1 score, ROC, memory, accuracy, and precision [44].

4. Hyperparameter tuning using Genetic algorithm

Genetic algorithms can optimize machine learning algorithm’s hyperparameters by systematically exploring potential hyperparameter combinations. Fig 8 shows the structure of genetic hyperparameter tuning on a machine learning algorithm [45].

Download:

Fig 8. The framework of Genetic algorithm for hyper-parameter optimization.

https://doi.org/10.1371/journal.pone.0308015.g008

Start by identifying the hyperparameters to fine-tune the machine learning model, such as learning rates or layer sizes. For each hyperparameter, specify the range or values it can take.

Population initialization

Begin with an initial set of hyperparameter combinations, forming a population of candidates. A population comprises individuals or solutions, typically referred to as chromosomes. Each chromosome is composed of a series of genes, where each gene, or multiple genes, depending on the encoding method, represents a single decision variable that will be applied to the objective functions. Various parameter combinations lead to diverse fitness values in vectors. Random mutations in the parameters are introduced within the population, and vectors with higher fitness levels outlive their counterparts [46].

Evaluation phase.

Train and assess a machine learning model for each candidate hyperparameter set. Use a performance metric (e.g., accuracy, error) to quantify how well each model performs.

Selection.

Choose a subset of candidates (called parents) for the next generation based on their model performance. Candidates who achieve better results are more likely to be selected. Common selection methods include random sampling or ranking candidates.

Crossover (Recombination).

Pair up the selected parents and generate new candidate hyperparameter sets (offspring) by merging their hyperparameter values. Crossover can involve blending or swapping hyperparameter values between parent candidates to create offspring.

Mutation.

Introduce small, random changes to hyperparameters in some offspring candidates. This introduces diversity into the population. Mutation helps explore the hyperparameter space more extensively.

Population update.

Replace some existing candidates with the newly generated offspring candidates. The selection for replacement is often based on the fitness (performance) of the candidates. This step ensures the population size remains consistent.

Termination conditions.

Specify when to stop the optimization process. This can be based on a maximum number of iterations, a performance threshold, or a time limit.

Final result.

The hyperparameter set that results in the best model performance during the optimization process is considered the optimal configuration [47].

The algorithm for generating hyperparameters using the genetic algorithm is as shown in Table 2.

Download:

Table 2. Algorithm for generating hyperparameter.

https://doi.org/10.1371/journal.pone.0308015.t002

Here’s a breakdown of how this process works.

Generate Initial Population. Begin by creating an initial set of machine learning (ML) models with randomly selected hyperparameters.
Evaluate Loss Function. Determine the loss function for each model, such as log-loss, to measure their performance.
Select Top Models. Identify and select a subset of models with the lowest error rates.
Create Offspring. Develop a new population of ML models by generating offspring from the top-performing models of the previous generation, making slight adjustments to their hyperparameters. Combine these offspring with models from the previous generation and new models in a specific ratio, for instance, 50/50.
Iterate the Process. Calculate the loss function for the new population, rank the models, and repeat the process for multiple generations.

Genetic algorithms, while powerful, require careful specification of the loss function, population size, and the ratio of offspring with modified parameters [48].

5. Dataset description

The dataset [49] contains 41 columns which includes clinical data as well as RT-PCR test. The 41 columns have certain attributes that aren’t required for the findings, thus they’re omitted from the dataset. Some non-relevant fields have been eliminated, including the patient’s ID, city name, and patient registration date, as well as nine additional columns. Table 3 shows that the 29 most relevant columns. Out of all the samples in the dataset, subjects having initial respiratory problems were considered for the study which make about 14964 patients’ samples [49]. The column’s values are the description of attributes. The value 1 means yes and the value 0 means no. Table 3 shows attributes’ descriptions as there are 29 main attributes consisting of personal attributes and clinical features [49].

Download:

Table 3. Dataset description.

https://doi.org/10.1371/journal.pone.0308015.t003

6. Experimental results and discussions

In this section, experimental results obtained after implementation Binary Grey Wolf Optimization for feature selection on dataset. It significantly impacts machine learning model performance by enhancing predictive accuracy, reducing computational complexity, improving interpretability, and ensuring robust generalization. Its effective exploration and exploitation strategies enable the selection of optimal feature subsets, contributing to the development of more efficient, reliable, and scalable machine learning models [50]. Further experimental results obtained after implementation and execution of the proposed model are compared with other state of the art machine learning algorithms, which are decision tree, adaboost, random forest, gradient boosting, light gradient boosting, extra tree, logistic regression, ridge classifier, linear discriminant analysis, naïve bayes, K-nearest neighbor and support vector machine, based on different evaluation metrics like accuracy, precision, recall, f1-score, Kappa_stat, MCC, and time required [51].

To validate the results 3-fold cross validation technique is considered. Table 4 summarizes the results obtained by executing different parameters as mentioned above. The results indicate the outperformance of the decision tree classifier classification algorithm in terms of different evaluation parameters.

Download:

Table 4. Results of machine learning algorithm.

https://doi.org/10.1371/journal.pone.0308015.t004

Fig 9 represents the confusion matrix obtained and the corresponding ROC curve for the top three best algorithms which are the Decision tree, Extra tree and Random Forest. The decision tree model achieved a sensitivity of 96%, a specificity of 92%, and a positive likelihood ratio of 18.19. In comparison, the Extra Trees model demonstrated a sensitivity of 95.5%, while the Random Forest model yielded a sensitivity of 94%, both of which are slightly lower than the sensitivity achieved by the decision tree.

Download:

Fig 9. Best three classifier confusion matrix and ROC curve.

https://doi.org/10.1371/journal.pone.0308015.g009

Pondering further, seven different algorithms were considered for hyperparameter optimization using grid search, random search, and Bayesian Optimization and Genetic algorithm techniques. Table 5 represents the results obtained after performing the hyperparameter optimization. The results indicate that when hyperparameter optimization of random forest is performed using a genetic algorithm, the results indicate the outperformance of the algorithm as compared to other algorithms considered for hyperparameter optimization.

Download:

Table 5. Random forest with hyperparameter optimization.

https://doi.org/10.1371/journal.pone.0308015.t005

Fig 10 represents the confusion matrix obtained, corresponding ROC curve and classification report of the random forest with genetic algorithm hyperparameter optimization.

Download:

Fig 10. Best optimizer results for random forest.

https://doi.org/10.1371/journal.pone.0308015.g010

Table 6 represents the results obtained after performing the hyperparameter optimization on the gradient boosting classifier. The results indicate that when hyperparameter optimization of gradient boosting classifier is performed using a genetic algorithm, the results indicate the outperformance of the algorithm as compared to other algorithms considered for hyperparameter optimization.

Download:

Table 6. Gradient boosting classifier results.

https://doi.org/10.1371/journal.pone.0308015.t006

Fig 11 represents the classification report, confusion matrix obtained and corresponding ROC curve of the gradient boosting classifier with genetic algorithm hyper parameter optimization. Furthermore, the outperformance of the Adaboost classifier with the genetic algorithm is represented in Table 7. The computation time of random search is often higher than other methods due to its time complexity.

Download:

Fig 11. Best optimizer result for gradient boosting classifier.

https://doi.org/10.1371/journal.pone.0308015.g011

Download:

Table 7. Adaboost classifier results.

https://doi.org/10.1371/journal.pone.0308015.t007

Fig 12 represents the true positive and false positive values in the confusion matrix and ROC curve of the Adaboost classifier with micro average and AUC. Table 8 represents the optimized parameters of Extra Trees using various optimization algorithms. Additionally, Table 8 illustrates the improved results of Extra Trees with the genetic algorithm optimization compared to other hyperparameter optimization algorithms.

Download:

Fig 12. Best optimizer result for Adaboost classifier.

https://doi.org/10.1371/journal.pone.0308015.g012

Download:

Table 8. Results of Extra tree.

https://doi.org/10.1371/journal.pone.0308015.t008

Fig 13 represents the confusion matrix, ROC curve, and classification report of Extra Trees with genetic algorithm hyperparameter optimization, showcasing its higher performance compared to other optimization algorithms. Moving ahead, Table 9 represents the results obtained after performing the hyperparameter optimization on the Light gradient boosting machine. The results indicate that when hyperparameter optimization of the Light gradient boosting machine is performed using a grid search algorithm, the results indicate the outperformance of the algorithm as compared to other algorithms considered for hyperparameter optimization.

Download:

Fig 13. Best optimizer result for Extra tree.

https://doi.org/10.1371/journal.pone.0308015.g013

Download:

Table 9. Results of Lightbgm.

https://doi.org/10.1371/journal.pone.0308015.t009

Fig 14 represents the better results of the Light gradient boosting machine with grid search hyperparameter optimization as a classification report in the form of precision, recall and support report with a graphical representation of ROC and confusion matrix.

Download:

Fig 14. Best optimizer results of Lightbgm.

https://doi.org/10.1371/journal.pone.0308015.g014

Table 10 represents the optimized parameters of the Decision tree using various optimization algorithms. Additionally, Table 10 illustrates the improved results of the Decision tree with the genetic algorithm optimization compared to other hyperparameter optimization algorithms.

Download:

Table 10. Results of Decision tree.

https://doi.org/10.1371/journal.pone.0308015.t010

Fig 15 represents the confusion matrix of the model, ROC curve and classification report of SVM with genetic algorithm hyperparameter optimization.

Download:

Fig 15. Best optimizer results of Decision tree.

https://doi.org/10.1371/journal.pone.0308015.g015

Table 11 represents the outcomes following the hyperparameter optimization process. The results demonstrate that the genetic algorithm for hyperparameter optimization in the KNN gives better results than other algorithms considered for the same purpose.

Download:

Table 11. Results of KNN.

https://doi.org/10.1371/journal.pone.0308015.t011

Fig 16 represents the ROC curve and classification value report of KNN with the best hyperparameter.

Download:

Fig 16. Best optimizer results of KNN.

https://doi.org/10.1371/journal.pone.0308015.g016

Pondering further, the dataset was reduced using the Binary Grey wolf feature selection technique to find the most relevant features. The dataset with selected features was then used for further testing. Table 12 represents the results of the gradient boosting classifier using a feature selection dataset. The results in Table 12 clearly show the outperformance of the gradient boosting classifier with the Genetic algorithm of hyperparameter optimization.

Download:

Table 12. Results of gradient boosting classifier.

https://doi.org/10.1371/journal.pone.0308015.t012

Fig 17 presents the confusion matrix, ROC curve and classification report obtained after performing hyperparameter optimization using the algorithm on the Gradient boosting classifier.

Download:

Fig 17. Best optimizer results of gradient boosting classifier.

https://doi.org/10.1371/journal.pone.0308015.g017

Table 13 represents the results of Adaboost with a genetic algorithm, demonstrating high accuracy values. Additionally, Fig 18 represents the confusion matrix, classification report and ROC curve to represent the accuracy of the best optimizer’s results on Adaboost. The feature selection technique reduces the execution time of the Extra tree with random search hyperparameter optimization with the Genetic Algorithm.

Download:

Fig 18. Best optimizer results of Adaboost.

https://doi.org/10.1371/journal.pone.0308015.g018

Download:

Table 13. Results of Adaboost.

https://doi.org/10.1371/journal.pone.0308015.t013

Table 14 represents the results of hyperparameter optimization on the Extra tree. Fig 19 represents the ROC curve and classification report representing the results of the best optimizer.

Download:

Fig 19. Best optimizer results of Extra tree.

https://doi.org/10.1371/journal.pone.0308015.g019

Download:

Table 14. Results of Extra tree.

https://doi.org/10.1371/journal.pone.0308015.t014

The findings of different hyperparameter optimization on light gradient boosting machine are compared to the results in Table 15, it’s clearly show that Lightbgm model has improved slightly in performance with the grid search hyperparameter optimization algorithm.

Download:

Table 15. Results of Lightbgm.

https://doi.org/10.1371/journal.pone.0308015.t015

Fig 20 represents the confusion matrix and ROC curve of the Lightbgm classifier with Grid search hyperparameter optimization.

Download:

Fig 20. Best optimizer results of Lightbgm.

https://doi.org/10.1371/journal.pone.0308015.g020

Table 16 represents the results of Random forest with different hyperparameter tuning algorithms with different parameters of random forest. Furthermore, the results of a Random Forest with Bayesian Optimization, demonstrate high accuracy values.

Download:

Table 16. Results of Random Forest.

https://doi.org/10.1371/journal.pone.0308015.t016

Fig 21 represents the confusion matrix, ROC curve and classification report of Random forest with the Genetic Algorithm hyperparameter optimization.

Download:

Fig 21. Best optimizer results of Random Forest.

https://doi.org/10.1371/journal.pone.0308015.g021

However, the efficacy of both the Decision tree and KNN models has significantly improved with genetic algorithm by 0.13% in Tables 17 and 18.

Download:

Table 17. Results of Decision tree.

https://doi.org/10.1371/journal.pone.0308015.t017

Download:

Table 18. Results of KNN.

https://doi.org/10.1371/journal.pone.0308015.t018

Fig 22 represents the confusion matrix and ROC curve of the Decision tree classifier with Genetic Algorithm hyperparameter optimization.

Download:

Fig 22. Best optimizer results of Decision tree.

https://doi.org/10.1371/journal.pone.0308015.g022

Furthermore, Table 18 represents the better performance of the KNN algorithm with genetic algorithm hyperparameter optimization. The results clearly indicate that the algorithm performed with 17 neighbors gives better results.

Fig 23 represents the confusion matrix of the model, ROC curve and classification report of KNN with genetic algorithm hyperparameter optimization.

Download:

Fig 23. Best optimizer results of KNN.

https://doi.org/10.1371/journal.pone.0308015.g023

Moving forward, using the top hyperparameters and a feature-selected dataset, an ensemble model is created. Evaluation and comparison of the best combination of machine learning algorithms compared with other machine learning algorithms. The ensemble model is created by finding the best combination of models with hyperparameter optimization algorithm parameters on the feature selection dataset. The Random forest, Adaboost and KNN model combination perform best as compared to other models. The accuracy of the ensemble model is 98% which is better than the hyperparameter optimized machine learning algorithm.

Fig 24 implies that the ensemble model is capable of distinguishing between positive and negative COVID-19 diagnoses. The findings show that the methods used to create the ensemble model result in more precise and reliable classification.

Download:

Fig 24. Confusion matrix of ensemble model.

https://doi.org/10.1371/journal.pone.0308015.g024

The ensemble model findings show that combining HPO-KNN, HPO-Random Forest, and HPO-Adaboost improves model performance when compared to other models in the trial. The results show that the feature selection method improves the model success rate on COVID-19 dataset records. As a result, the model trains have more qualified data, which improves efficiency. Furthermore, the use of HPO-KNN, HPO-Random Forest, and HPO-Adaboost improves model stability. Furthermore, because it incorporates predictions from various classification models, the fundamental blocks of the ensemble classification model produce a robust prediction.

Table 19 represents the comparison of the Ensemble model with other machine learning algorithms. The results are compared with different evaluation metrics like accuracy, recall, precision and f-measure.

Download:

Table 19. Comparison table of the proposed model.

https://doi.org/10.1371/journal.pone.0308015.t019

Fig 25 shows the statistical analysis of the proposed model compared with other machine learning models. The graph displays the sensitivity, specificity, positive likelihood, and negative likelihood of each model.

Download:

Fig 25. Statistical analysis of proposed model with other model.

https://doi.org/10.1371/journal.pone.0308015.g025

Fig 26 represents a graphical depiction of the ROC demonstrating that the ensemble methods takes the peak in terms of ACC, while the GBM stays at the bottom. Table 20 represents the comparison of related studies in predicting COVID-19 and the proposed model which clearly indicate maximum accuracy with an ensemble of three machine learning algorithms.

Download:

Fig 26. ROC comparison of machine learning algorithm.

https://doi.org/10.1371/journal.pone.0308015.g026

Download:

Table 20. Comparison of related studies in predicting COVID-19 diagnosis.

https://doi.org/10.1371/journal.pone.0308015.t020

7. Feature importance using Explainable AI (SHAP analysis)

AI solutions were black box in nature, necessitating model explanation. If machine learning experts develop tools to comprehend and explain the models they constructed, non-technical people’s doubts and suspicions are legitimate. SHAP [5] is one of the tools that was introduced a few years ago. It can deconstruct any machine learning model or deep neural net to make them intelligible to everyone. SHAP analysis explains what (and how) different factors impact your model’s decisions. The significance of incoming characteristics in forecasting a target variable is represented by feature importance [52]. Most significantly, the listing of feature significance improves the predictive modeling project’s efficacy and efficiency. In this research, we used the SHAP summary image [53].

Explainable AI (XAI) models and methods include Decision Trees, Logistic Regression, and Rule-Based Models for intrinsic interpretability, and post-hoc techniques like LIME, SHAP, and Grad-CAM for explaining complex models. Other approaches like Counterfactual Explanations and Partial Dependence Plots provide global and local insights, while advanced models such as Explainable Boosting Machines and Bayesian Rule [54] Lists combine transparency with predictive power. These XAI techniques enhance trust and accountability in AI by making their decisions understandable and transparent. As we can see, using the SHAP summary represented in Fig 26 has two advantages. feature ordering and the impact of each feature [55].

The feature ranking in decreasing sequence is determined by the location on the y-axis. (Higher importance to lower importance). X-axis SHAP values [56] decide the impact of each feature; positive SHAP values demonstrate a direct link with the target variable, and the opposite is also true. Additionally, the red shading corresponds to higher feature values, contrasting with the blue shading that stands for lower feature values. The irregular and intersecting lines suggest a sense of dispersion [57]. The importance of features for any categorization or forecast can be easily assessed by sorting the features in descending order, with the most important feature occupying the peak point. For example, as shown in Fig 23, visualizes in the form of a bar plot of the best 10 characteristics, with “INTUBADO “at the top. The following dominant characteristics are "INTUBADO", "EDAD", "ENTIDAD_RES", "ENTIDAD_NAC", "EMBARAZO", and so on. In comparison to the other characteristics depicted in the diagram, "DIABETES" stays hidden. Furthermore, as shown in Fig 23, increased "INTUBADO", "EDAD" and "ENTIDAD_RES" have a negative SHAP value, indicating a negative association.

Fig 27 represents the SHAP value of different features. It is obvious that greater values of this characteristic imply a lower probability of survival, i.e., in the case of COVID-19 here, and conversely as well. It’s important to highlight that the overview image provides a top-down view of the data [47]. The reliance plot for SHAP values and the feature interaction plot for SHAP values serve as tools to examine a particular feature and instance [48]. This assessment could determine how a sole feature influences the enhancement of model effectiveness, a matter not covered in this research, but reserved for future investigations.

Download:

Fig 27. SHAP analysis.

https://doi.org/10.1371/journal.pone.0308015.g027

The Fig 28 illustrate the impact of features on the model’s predictions, distinguishing between Class 0 (COVID-19 positive) and Class 1 (COVID-19 negative). The graph highlights the significance of each variable with magnitude values, showcasing their importance in predicting COVID-19 outcomes [58]. This visualization effectively underscores the differential contribution of features in classifying COVID-19 status, facilitating a deeper understanding of the model’s decision-making process [23, 59].

Download:

Fig 28. SHAP analysis mean value.

https://doi.org/10.1371/journal.pone.0308015.g028

8. Conclusions and future works

This study aspires to propose an integrated machine learning model for respiratory disease prediction taking into consideration one of the most fatal diseases recently seen by the human race that is COVID-19. Seven contemporary machine learning classifiers have been coupled with different hyperparameter optimization techniques and a feature selection approach with an aim to enhance the prediction capability. The system’s efficacy was rigorously evaluated through diverse performance metrics such as ACC, F1-score, MCC, and Kappa index, offering valuable insights from both patient and clinician viewpoints. The incorporation of SHAP values facilitates a comprehensive analysis of prediction outcomes for the observations. This technique of ranking input variables for identifying positive COVID-19 results helps to interpret the justification behind the model’s classification decisions. Furthermore, the proposed model can be readily extended to predict other ailments like diabetes, asthma, and hypertension. In summary, this study not only contributes to the realm of respiratory disease prediction but also lays the foundation for broader applications in disease forecasting. The seamless amalgamation of machine learning techniques, clinical datasets, and optimization strategies offers a holistic approach that has the potential to revolutionize healthcare analytics. Although this study is limited by the absence of experiments involving datasets with missing values and the evaluation of model performance on big data. Future work could explore the development of the model into an application integrated with Internet of Things (IoT) technologies. Furthermore, the work can be extended to utilize deep learning models for the extraction of features from the images and using standard machine learning techniques for classification while considering various evolutionary algorithms for feature selection.

Acknowledgments

Thanks to all authors for completing this work properly.

References

1. https://covid19.who.int/table/ (accessed December 8, 2023)
2. Mansbridge N. et al., “Feature selection and comparison of machine learning algorithms in classification of grazing and rumination behaviour in sheep,” Sensors (Switzerland), vol. 18, no. 10, pp. 1–16, 2018. pmid:30347653
- View Article
- PubMed/NCBI
- Google Scholar
3. Uddin S., Khan A., Hossain M. E., and Moni M. A., “Comparing different supervised machine learning algorithms for disease prediction,” BMC Med. Inform. Decis. Mak., vol. 19, no. 1, pp. 1–16, 2019.
- View Article
- Google Scholar
4. Ernawan F., Handayani K., Fakhreldin M., and Abbker Y., “Light Gradient Boosting with Hyper Parameter Tuning Optimization for COVID-19 Prediction,” Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 8, pp. 514–523, 2022.
- View Article
- Google Scholar
5. Muhammad L. J. et al. 2021. “Supervised Machine Learning Models for Prediction of COVID-19 Infection Using Epidemiology Dataset.” SN Computer Science 2(1). 1–13. pmid:33263111
- View Article
- PubMed/NCBI
- Google Scholar
6. Sharma Ajay, and Pramod Kumar Mishra. 2022. “Performance Analysis of Machine Learning Based Optimized Feature Selection Approaches for Breast Cancer Diagnosis.” International Journal of Information Technology (Singapore) 14(4). 1949–60.
- View Article
- Google Scholar
7. Sevinç E., “An empowered AdaBoost algorithm implementation. A COVID-19 dataset study,” Comput. Ind. Eng., vol. 165, no. December 2021, p. 107912, 2022.
- View Article
- Google Scholar
8. An T. K. and Kim M. H., “A new Diverse AdaBoost classifier,” Proc.—Int. Conf. Artif. Intell. Comput. Intell. AICI 2010, vol. 1, pp. 359–363, 2010.
- View Article
- Google Scholar
9. Sayed S. A. F., Elkorany A. M., and Sayed Mohammad S., “Applying Different Machine Learning Techniques for Prediction of COVID-19 Severity,” IEEE Access, vol. 9, pp. 135697–135707, 2021. pmid:34786321
- View Article
- PubMed/NCBI
- Google Scholar
10. Chowdhury N. K., Kabir M. A., Rahman M. M., and Islam S. M. S., “Machine learning for detecting COVID-19 from cough sounds. An ensemble-based MCDM method,” Comput. Biol. Med., vol. 145, no. November 2021, p. 105405, 2022.
- View Article
- Google Scholar
11. Zargari Khuzani A., Heidari M., and Shariati S. A., “COVID-Classifier. an automated machine learning model to assist in the diagnosis of COVID-19 infection in chest X-ray images,” Sci. Rep., vol. 11, no. 1, pp. 1–6, 2021.
- View Article
- Google Scholar
12. Sreedharan R. and Kumar A. P., “Analysis and prediction of smart data using machine learning,” AIP Conf. Proc., vol. 2240, no. Ml, pp. 15–21, 2020.
- View Article
- Google Scholar
13. Hu P., Pan J. S., and Chu S. C., “Improved Binary Grey Wolf Optimizer and Its application for feature selection,” Knowledge-Based Syst., vol. 195, no. xxxx, p. 105746, 2020.
- View Article
- Google Scholar
14. Emary E., Zawbaa H. M., and Hassanien A. E., “Binary grey wolf optimization approaches for feature selection,” Neurocomputing, vol. 172, pp. 371–381, 2016.
- View Article
- Google Scholar
15. Ciotti M., Ciccozzi M., Terrinoni A., Jiang W. C., Bin Wang C., and Bernardini S., “The COVID-19 pandemic,” Crit. Rev. Clin. Lab. Sci., vol. 0, no. 0, pp. 365–388, 2020. pmid:32645276
- View Article
- PubMed/NCBI
- Google Scholar
16. Velavan T. P. and Meyer C. G., “The COVID-19 epidemic,” Trop. Med. Int. Heal., vol. 25, no. 3, pp. 278–280, 2020. pmid:32052514
- View Article
- PubMed/NCBI
- Google Scholar
17. Alali Y., Harrou F., and Sun Y., “A proficient approach to forecast COVID-19 spread via optimized dynamic machine learning models,” Sci. Rep., vol. 12, no. 1, pp. 1–20, 2022.
- View Article
- Google Scholar
18. Yang L. and Shami A., “On hyperparameter optimization of machine learning algorithms. Theory and practice,” Neurocomputing, vol. 415, pp. 295–316, 2020.
- View Article
- Google Scholar
19. Debjit K. et al., “An Improved Machine-Learning Approach for COVID-19 Prediction Using Harris Hawks Optimization and Feature Analysis Using SHAP,” Diagnostics, vol. 12, no. 5, 2022. pmid:35626179
- View Article
- PubMed/NCBI
- Google Scholar
20. Shahhosseini M., Hu G., and Pham H., “Optimizing ensemble weights and hyperparameters of machine learning models for regression problems,” Mach. Learn. with Appl., vol. 7, no. December 2021, p. 100251, 2022.
- View Article
- Google Scholar
21. Mohana Saranya S., Rajalaxmi R. R., Mohanapriya S., Prasida S., and Nithyalaxmi P., “Prediction of Covid-19 Using Hyperparameter Optimized Convolutional Neural Network,” Turkish J. Comput. Math. Educ., vol. 12, no. 9, pp. 448–455, 2021.
- View Article
- Google Scholar
22. S. Hamida, O. E. L. Gannour, B. Cherradi, H. Ouajji, and A. Raihani, “Optimization of machine learning algorithms hyper-parameters for improving the prediction of patients infected with COVID-19,” 2020 IEEE 2nd Int. Conf. Electron. Control. Optim. Comput. Sci. ICECOCS 2020, no. 1, 2020.
23. Aljouie Abdulrhman Fahad et al. 2021. “Early Prediction of COVID-19 Ventilation Requirement and Mortality from Routinely Collected Baseline Chest Radiographs, Laboratory, and Clinical Data with Machine Learning.” Journal of Multidisciplinary Healthcare 14. 2017–33. pmid:34354361
- View Article
- PubMed/NCBI
- Google Scholar
24. Pourhomayoun M. and Shakibi M., “Predicting mortality risk in patients with COVID-19 using machine learning to help medical decision-making,” Smart Heal., vol. 20, no. April 2020, p. 100178, 2021.
- View Article
- Google Scholar
25. Attallah Omneya. 2022. “An Intelligent ECG-Based Tool for Diagnosing COVID-19 via Ensemble Deep Learning Techniques.” Biosensors 12(5). pmid:35624600
- View Article
- PubMed/NCBI
- Google Scholar
26. Rostami Mehrdad, and Oussalah Mourad. 2022. “A Novel Explainable COVID-19 Diagnosis Method by Integration of Feature Selection with Random Forest.” Informatics in Medicine Unlocked 30(January). 100941. pmid:35399333
- View Article
- PubMed/NCBI
- Google Scholar
27. Ozyurt Fatih, Tuncer Turker, and Subasi Abdulhamit. 2021. “An Automated COVID-19 Detection Based on Fused Dynamic Exemplar Pyramid Feature Extraction and Hybrid Feature Selection Using Deep Learning.” Computers in Biology and Medicine 132(March). 104356. pmid:33799219
- View Article
- PubMed/NCBI
- Google Scholar
28. Chattopadhyay Soham et al. 2021. “Covid-19 Detection by Optimizing Deep Residual Features with Improved Clustering-Based Golden Ratio Optimizer.” Diagnostics 11(2). 1–27. pmid:33671992
- View Article
- PubMed/NCBI
- Google Scholar
29. El-Kenawy El Sayed M. et al. 2020. “Novel Feature Selection and Voting Classifier Algorithms for COVID-19 Classification in CT Images.” IEEE Access 8. pmid:34976558
- View Article
- PubMed/NCBI
- Google Scholar
30. Pramanik Rishav, Sarkar Sourodip, and Sarkar Ram. 2022. “An Adaptive and Altruistic PSO-Based Deep Feature Selection Method for Pneumonia Detection from Chest X-Rays.” Applied Soft Computing 128. 1–23. pmid:35966452
- View Article
- PubMed/NCBI
- Google Scholar
31. Yagin Fatma Hilal et al. 2023. “Explainable Artificial Intelligence Model for Identifying COVID-19 Gene Biomarkers.” Computers in Biology and Medicine 154(November 2022). pmid:36738712
- View Article
- PubMed/NCBI
- Google Scholar
32. Hamal Susmita et al. 2024. “A Comparative Analysis of Machine Learning Algorithms for Detecting COVID-19 Using Lung X-Ray Images.” Decision Analytics Journal 11(June 2023). 100460.
- View Article
- Google Scholar
33. Héberger Károly. 2024. “Frequent Errors in Modeling by Machine Learning. A Prototype Case of Predicting the Timely Evolution of COVID-19 Pandemic.” Algorithms 17(1).
- View Article
- Google Scholar
34. Dewi K. C., Mustika W. F., & Murfi H. (2019, March). “Ensemble learning for predicting mortality rates affected by air quality”. In Journal of physics. Conference series (vol. 1192, No. 1, p. 012021). IOP Publishing.
- View Article
- Google Scholar
35. de Moraes Batista A. F., Miraglia J. L., Rizzi Donato T. H., & Porto Chiavegatto Filho A. D. (2020). “COVID-19 diagnosis prediction in emergency care patients. a machine learning approach”. MedRxiv, 2020–04.
- View Article
- Google Scholar
36. Kukar M., Gunčar G., Vovko T. et al. “COVID-19 diagnosis by routine blood tests using machine learning”. Sci Rep 11, 10738 (2021). pmid:34031483
- View Article
- PubMed/NCBI
- Google Scholar
37. Kassania S. H., Kassanib P. H., Wesolowskic M. J., Schneidera K. A., and Detersa R., “Automatic Detection of Coronavirus Disease (COVID-19) in X-ray and CT Images. A Machine Learning Based Approach,” Biocybern. Biomed. Eng., vol. 41, no. 3, pp. 867–879, 2021.
- View Article
- Google Scholar
38. Adimoolam M., Govindharaju K., John A., Mohan S., Ahmadian A., and Ciano T., “A hybrid learning approach for the stage-wise classification and prediction of COVID-19 X-ray images,” Expert Syst., vol. 39, no. 4, 2022.
- View Article
- Google Scholar
39. Abayomi-Alli O. O., Damaševičius R., Maskeliūnas R., and Misra S., “An Ensemble Learning Model for COVID-19 Detection from Blood Test Samples,” Sensors, vol. 22, no. 6, 2022. pmid:35336395
- View Article
- PubMed/NCBI
- Google Scholar
40. Sagi O. and Rokach L., “Ensemble learning. A survey,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 8, no. 4, pp. 1–18, 2018.
- View Article
- Google Scholar
41. Ndwandwe D. and Wiysonge C. S., “COVID-19 vaccines,” Curr. Opin. Immunol., vol. 71, no. 1, pp. 111–116, 2021. pmid:34330017
- View Article
- PubMed/NCBI
- Google Scholar
42. McCoy D., Mgbara W., Horvitz N., Getz W. M., and Hubbard A., “Ensemble machine learning of factors influencing COVID-19 across US counties,” Sci. Rep., vol. 11, no. 1, pp. 1–14, 2021.
- View Article
- Google Scholar
43. AlJame M., Ahmad I., Imtiaz A., and Mohammed A., “Ensemble learning model for diagnosing COVID-19 from routine blood tests,” Informatics Med. Unlocked, vol. 21, p. 100449, 2020. pmid:33102686
- View Article
- PubMed/NCBI
- Google Scholar
44. R. Shaaque, A. Mehmood, G. S. Choi, R. Shafique, and S. Ullah, “Cardiovascular Disease Prediction System Using Extra Trees Classiier Cardiovascular Disease Prediction System Using Extra Trees Classifier,” 2019.
45. Shrivastav L. K. and Jha S. K., “A gradient boosting machine learning approach in modeling the impact of temperature and humidity on the transmission rate of COVID-19 in India,” Appl. Intell., vol. 51, no. 5, pp. 2727–2739, 2021. pmid:34764559
- View Article
- PubMed/NCBI
- Google Scholar
46. S. Tripath, “Gradient-Boosting Machine Model,” pp. 19–21.
47. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl, “Algorithms for hyper-parameter optimization,” Adv. Neural Inf. Process. Syst. 24 25th Annu. Conf. Neural Inf. Process. Syst. 2011, NIPS 2011, pp. 1–9, 2011.
48. Xia X., Jiang S., Zhou N., Li X., and Wang L., “Genetic algorithm hyper-parameter optimization using taguchi design for groundwater pollution source identification,” Water Sci. Technol. Water Supply, vol. 19, no. 1, pp. 137–146, 2019.
- View Article
- Google Scholar
49. www.kaggle.com/marianarfranklin/mexico-covid19-clinical-data/
50. Thapa, Surendrabikram, Surabhi Adhikari, Awishkar Ghimire, and Anshuman Aditya. 2020. “Feature Selection Based Twin-Support Vector Machine for the Diagnosis of Parkinson’s Disease.” IEEE Region 10 Humanitarian Technology Conference, R10-HTC 2020-December(December).
51. Xiong Yibai et al. 2022. “Comparing Different Machine Learning Techniques for Predicting COVID-19 Severity.” Infectious Diseases of Poverty 11(1). 1–9.
- View Article
- Google Scholar
52. D. Devetyarov, I. Nouretdinov, C. Based, and R. Forest, “Prediction with Confidence Based on a Random Forest Classifier To cite this version. HAL Id. hal-01060649 Prediction with Confidence Based on a Random Forest Classifier,” pp. 0–8, 2017.
53. Imam A. T., Alhroob A., and Alzyadat W. J., “SVM Machine Learning Classifier to Automate the Extraction of SRS Elements,” Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 3, pp. 174–185, 2021.
- View Article
- Google Scholar
54. D. A. Pisner and D. M. Schnyer, Support vector machine. Elsevier Inc., 2019.
55. Rai N., Kaushik N., Kumar D., Raj C., and Ali A., “Mortality prediction of COVID-19 patients using soft voting classifier,” Int. J. Cogn. Comput. Eng., vol. 3, no. June, pp. 172–179, 2022.
- View Article
- Google Scholar
56. Florea A. C. and Andonie R., “Weighted Random Search for hyperparameter optimization,” Int. J. Comput. Commun. Control, vol. 14, no. 2, pp. 154–169, 2019.
- View Article
- Google Scholar
57. Haqmi Abas M. A., “Agarwood Oil Quality Classification using Support Vector Classifier and Grid Search Cross Validation Hyperparameter Tuning,” Int. J. Emerg. Trends Eng. Res., vol. 8, no. 6, pp. 2551–2556, 2020.
- View Article
- Google Scholar
58. Ali Yasser A., Emad Mahrous Awwad Muna Al-Razgan, and Maarouf Ali. 2023. “Hyperparameter Search for Machine Learning Algorithms for Optimizing the Computational Complexity.” Processes 11(2).
- View Article
- Google Scholar
59. Chieregato Matteo et al. 2022. “A Hybrid Machine Learning/Deep Learning COVID-19 Severity Predictive Model from CT Images and Clinical Data.” Scientific Reports 12(1). 1–15.
- View Article
- Google Scholar
60. Muhammad L. J., Algehyne E. A., Usman S. S., Ahmad A., Chakraborty C., & Mohammed I. A. (2021). Supervised machine learning models for prediction of COVID-19 infection using epidemiology dataset. SN computer science, 2(1), 1–13. pmid:33263111
- View Article
- PubMed/NCBI
- Google Scholar
61. Han X., Hu Z., Wang S., & Zhang Y. (2022). A survey on deep learning in COVID-19 diagnosis. Journal of imaging, 9(1), 1. pmid:36662099
- View Article
- PubMed/NCBI
- Google Scholar
62. Bode B., Garrett V., Messler J., McFarland R., Crowe J., Booth R., & Klonoff D. C. (2020). Glycemic characteristics and clinical outcomes of COVID-19 patients hospitalized in the United States. Journal of diabetes science and technology, 14(4), 813–821. pmid:32389027
- View Article
- PubMed/NCBI
- Google Scholar
63. Chadaga K., Prabhu S., Umakanth S., Bhat K., Sampathila N., & Chadaga R. (2021). COVID-19 mortality prediction among patients using epidemiological parameters: an ensemble machine learning approach. Engineered Science, 16(10), 221–233.
- View Article
- Google Scholar
64. Becerra-Sánchez A., Rodarte-Rodríguez A., Escalante-García N. I., Olvera-González J. E., De la Rosa-Vargas J. I., Zepeda-Valles G., et al. (2022). Mortality analysis of patients with COVID-19 in Mexico based on risk factors applying machine learning techniques. Diagnostics, 12(6), 1396. pmid:35741207
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. https://covid19.who.int/table/ (accessed December 8, 2023)

[ref2] 2. Mansbridge N. et al., “Feature selection and comparison of machine learning algorithms in classification of grazing and rumination behaviour in sheep,” Sensors (Switzerland), vol. 18, no. 10, pp. 1–16, 2018. pmid:30347653
View Article
PubMed/NCBI
Google Scholar

[3] View Article

[4] PubMed/NCBI

[5] Google Scholar

[ref3] 3. Uddin S., Khan A., Hossain M. E., and Moni M. A., “Comparing different supervised machine learning algorithms for disease prediction,” BMC Med. Inform. Decis. Mak., vol. 19, no. 1, pp. 1–16, 2019.
View Article
Google Scholar

[7] View Article

[8] Google Scholar

[ref4] 4. Ernawan F., Handayani K., Fakhreldin M., and Abbker Y., “Light Gradient Boosting with Hyper Parameter Tuning Optimization for COVID-19 Prediction,” Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 8, pp. 514–523, 2022.
View Article
Google Scholar

[10] View Article

[11] Google Scholar

[ref5] 5. Muhammad L. J. et al. 2021. “Supervised Machine Learning Models for Prediction of COVID-19 Infection Using Epidemiology Dataset.” SN Computer Science 2(1). 1–13. pmid:33263111
View Article
PubMed/NCBI
Google Scholar

[13] View Article

[14] PubMed/NCBI

[15] Google Scholar

[ref6] 6. Sharma Ajay, and Pramod Kumar Mishra. 2022. “Performance Analysis of Machine Learning Based Optimized Feature Selection Approaches for Breast Cancer Diagnosis.” International Journal of Information Technology (Singapore) 14(4). 1949–60.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref7] 7. Sevinç E., “An empowered AdaBoost algorithm implementation. A COVID-19 dataset study,” Comput. Ind. Eng., vol. 165, no. December 2021, p. 107912, 2022.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref8] 8. An T. K. and Kim M. H., “A new Diverse AdaBoost classifier,” Proc.—Int. Conf. Artif. Intell. Comput. Intell. AICI 2010, vol. 1, pp. 359–363, 2010.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref9] 9. Sayed S. A. F., Elkorany A. M., and Sayed Mohammad S., “Applying Different Machine Learning Techniques for Prediction of COVID-19 Severity,” IEEE Access, vol. 9, pp. 135697–135707, 2021. pmid:34786321
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref10] 10. Chowdhury N. K., Kabir M. A., Rahman M. M., and Islam S. M. S., “Machine learning for detecting COVID-19 from cough sounds. An ensemble-based MCDM method,” Comput. Biol. Med., vol. 145, no. November 2021, p. 105405, 2022.
View Article
Google Scholar

[30] View Article

[31] Google Scholar

[ref11] 11. Zargari Khuzani A., Heidari M., and Shariati S. A., “COVID-Classifier. an automated machine learning model to assist in the diagnosis of COVID-19 infection in chest X-ray images,” Sci. Rep., vol. 11, no. 1, pp. 1–6, 2021.
View Article
Google Scholar

[33] View Article

[34] Google Scholar

[ref12] 12. Sreedharan R. and Kumar A. P., “Analysis and prediction of smart data using machine learning,” AIP Conf. Proc., vol. 2240, no. Ml, pp. 15–21, 2020.
View Article
Google Scholar

[36] View Article

[37] Google Scholar

[ref13] 13. Hu P., Pan J. S., and Chu S. C., “Improved Binary Grey Wolf Optimizer and Its application for feature selection,” Knowledge-Based Syst., vol. 195, no. xxxx, p. 105746, 2020.
View Article
Google Scholar

[39] View Article

[40] Google Scholar

[ref14] 14. Emary E., Zawbaa H. M., and Hassanien A. E., “Binary grey wolf optimization approaches for feature selection,” Neurocomputing, vol. 172, pp. 371–381, 2016.
View Article
Google Scholar

[42] View Article

[43] Google Scholar

[ref15] 15. Ciotti M., Ciccozzi M., Terrinoni A., Jiang W. C., Bin Wang C., and Bernardini S., “The COVID-19 pandemic,” Crit. Rev. Clin. Lab. Sci., vol. 0, no. 0, pp. 365–388, 2020. pmid:32645276
View Article
PubMed/NCBI
Google Scholar

[45] View Article

[46] PubMed/NCBI

[47] Google Scholar

[ref16] 16. Velavan T. P. and Meyer C. G., “The COVID-19 epidemic,” Trop. Med. Int. Heal., vol. 25, no. 3, pp. 278–280, 2020. pmid:32052514
View Article
PubMed/NCBI
Google Scholar

[49] View Article

[50] PubMed/NCBI

[51] Google Scholar

[ref17] 17. Alali Y., Harrou F., and Sun Y., “A proficient approach to forecast COVID-19 spread via optimized dynamic machine learning models,” Sci. Rep., vol. 12, no. 1, pp. 1–20, 2022.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref18] 18. Yang L. and Shami A., “On hyperparameter optimization of machine learning algorithms. Theory and practice,” Neurocomputing, vol. 415, pp. 295–316, 2020.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref19] 19. Debjit K. et al., “An Improved Machine-Learning Approach for COVID-19 Prediction Using Harris Hawks Optimization and Feature Analysis Using SHAP,” Diagnostics, vol. 12, no. 5, 2022. pmid:35626179
View Article
PubMed/NCBI
Google Scholar

[59] View Article

[60] PubMed/NCBI

[61] Google Scholar

[ref20] 20. Shahhosseini M., Hu G., and Pham H., “Optimizing ensemble weights and hyperparameters of machine learning models for regression problems,” Mach. Learn. with Appl., vol. 7, no. December 2021, p. 100251, 2022.
View Article
Google Scholar

[63] View Article

[64] Google Scholar

[ref21] 21. Mohana Saranya S., Rajalaxmi R. R., Mohanapriya S., Prasida S., and Nithyalaxmi P., “Prediction of Covid-19 Using Hyperparameter Optimized Convolutional Neural Network,” Turkish J. Comput. Math. Educ., vol. 12, no. 9, pp. 448–455, 2021.
View Article
Google Scholar

[66] View Article

[67] Google Scholar

[ref22] 22. S. Hamida, O. E. L. Gannour, B. Cherradi, H. Ouajji, and A. Raihani, “Optimization of machine learning algorithms hyper-parameters for improving the prediction of patients infected with COVID-19,” 2020 IEEE 2nd Int. Conf. Electron. Control. Optim. Comput. Sci. ICECOCS 2020, no. 1, 2020.

[ref23] 23. Aljouie Abdulrhman Fahad et al. 2021. “Early Prediction of COVID-19 Ventilation Requirement and Mortality from Routinely Collected Baseline Chest Radiographs, Laboratory, and Clinical Data with Machine Learning.” Journal of Multidisciplinary Healthcare 14. 2017–33. pmid:34354361
View Article
PubMed/NCBI
Google Scholar

[70] View Article

[71] PubMed/NCBI

[72] Google Scholar

[ref24] 24. Pourhomayoun M. and Shakibi M., “Predicting mortality risk in patients with COVID-19 using machine learning to help medical decision-making,” Smart Heal., vol. 20, no. April 2020, p. 100178, 2021.
View Article
Google Scholar

[74] View Article

[75] Google Scholar

[ref25] 25. Attallah Omneya. 2022. “An Intelligent ECG-Based Tool for Diagnosing COVID-19 via Ensemble Deep Learning Techniques.” Biosensors 12(5). pmid:35624600
View Article
PubMed/NCBI
Google Scholar

[77] View Article

[78] PubMed/NCBI

[79] Google Scholar

[ref26] 26. Rostami Mehrdad, and Oussalah Mourad. 2022. “A Novel Explainable COVID-19 Diagnosis Method by Integration of Feature Selection with Random Forest.” Informatics in Medicine Unlocked 30(January). 100941. pmid:35399333
View Article
PubMed/NCBI
Google Scholar

[81] View Article

[82] PubMed/NCBI

[83] Google Scholar

[ref27] 27. Ozyurt Fatih, Tuncer Turker, and Subasi Abdulhamit. 2021. “An Automated COVID-19 Detection Based on Fused Dynamic Exemplar Pyramid Feature Extraction and Hybrid Feature Selection Using Deep Learning.” Computers in Biology and Medicine 132(March). 104356. pmid:33799219
View Article
PubMed/NCBI
Google Scholar

[85] View Article

[86] PubMed/NCBI

[87] Google Scholar

[ref28] 28. Chattopadhyay Soham et al. 2021. “Covid-19 Detection by Optimizing Deep Residual Features with Improved Clustering-Based Golden Ratio Optimizer.” Diagnostics 11(2). 1–27. pmid:33671992
View Article
PubMed/NCBI
Google Scholar

[89] View Article

[90] PubMed/NCBI

[91] Google Scholar

[ref29] 29. El-Kenawy El Sayed M. et al. 2020. “Novel Feature Selection and Voting Classifier Algorithms for COVID-19 Classification in CT Images.” IEEE Access 8. pmid:34976558
View Article
PubMed/NCBI
Google Scholar

[93] View Article

[94] PubMed/NCBI

[95] Google Scholar

[ref30] 30. Pramanik Rishav, Sarkar Sourodip, and Sarkar Ram. 2022. “An Adaptive and Altruistic PSO-Based Deep Feature Selection Method for Pneumonia Detection from Chest X-Rays.” Applied Soft Computing 128. 1–23. pmid:35966452
View Article
PubMed/NCBI
Google Scholar

[97] View Article

[98] PubMed/NCBI

[99] Google Scholar

[ref31] 31. Yagin Fatma Hilal et al. 2023. “Explainable Artificial Intelligence Model for Identifying COVID-19 Gene Biomarkers.” Computers in Biology and Medicine 154(November 2022). pmid:36738712
View Article
PubMed/NCBI
Google Scholar

[101] View Article

[102] PubMed/NCBI

[103] Google Scholar

[ref32] 32. Hamal Susmita et al. 2024. “A Comparative Analysis of Machine Learning Algorithms for Detecting COVID-19 Using Lung X-Ray Images.” Decision Analytics Journal 11(June 2023). 100460.
View Article
Google Scholar

[105] View Article

[106] Google Scholar

[ref33] 33. Héberger Károly. 2024. “Frequent Errors in Modeling by Machine Learning. A Prototype Case of Predicting the Timely Evolution of COVID-19 Pandemic.” Algorithms 17(1).
View Article
Google Scholar

[108] View Article

[109] Google Scholar

[ref34] 34. Dewi K. C., Mustika W. F., & Murfi H. (2019, March). “Ensemble learning for predicting mortality rates affected by air quality”. In Journal of physics. Conference series (vol. 1192, No. 1, p. 012021). IOP Publishing.
View Article
Google Scholar

[111] View Article

[112] Google Scholar

[ref35] 35. de Moraes Batista A. F., Miraglia J. L., Rizzi Donato T. H., & Porto Chiavegatto Filho A. D. (2020). “COVID-19 diagnosis prediction in emergency care patients. a machine learning approach”. MedRxiv, 2020–04.
View Article
Google Scholar

[114] View Article

[115] Google Scholar

[ref36] 36. Kukar M., Gunčar G., Vovko T. et al. “COVID-19 diagnosis by routine blood tests using machine learning”. Sci Rep 11, 10738 (2021). pmid:34031483
View Article
PubMed/NCBI
Google Scholar

[117] View Article

[118] PubMed/NCBI

[119] Google Scholar

[ref37] 37. Kassania S. H., Kassanib P. H., Wesolowskic M. J., Schneidera K. A., and Detersa R., “Automatic Detection of Coronavirus Disease (COVID-19) in X-ray and CT Images. A Machine Learning Based Approach,” Biocybern. Biomed. Eng., vol. 41, no. 3, pp. 867–879, 2021.
View Article
Google Scholar

[121] View Article

[122] Google Scholar

[ref38] 38. Adimoolam M., Govindharaju K., John A., Mohan S., Ahmadian A., and Ciano T., “A hybrid learning approach for the stage-wise classification and prediction of COVID-19 X-ray images,” Expert Syst., vol. 39, no. 4, 2022.
View Article
Google Scholar

[124] View Article

[125] Google Scholar

[ref39] 39. Abayomi-Alli O. O., Damaševičius R., Maskeliūnas R., and Misra S., “An Ensemble Learning Model for COVID-19 Detection from Blood Test Samples,” Sensors, vol. 22, no. 6, 2022. pmid:35336395
View Article
PubMed/NCBI
Google Scholar

[127] View Article

[128] PubMed/NCBI

[129] Google Scholar

[ref40] 40. Sagi O. and Rokach L., “Ensemble learning. A survey,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 8, no. 4, pp. 1–18, 2018.
View Article
Google Scholar

[131] View Article

[132] Google Scholar

[ref41] 41. Ndwandwe D. and Wiysonge C. S., “COVID-19 vaccines,” Curr. Opin. Immunol., vol. 71, no. 1, pp. 111–116, 2021. pmid:34330017
View Article
PubMed/NCBI
Google Scholar

[134] View Article

[135] PubMed/NCBI

[136] Google Scholar

[ref42] 42. McCoy D., Mgbara W., Horvitz N., Getz W. M., and Hubbard A., “Ensemble machine learning of factors influencing COVID-19 across US counties,” Sci. Rep., vol. 11, no. 1, pp. 1–14, 2021.
View Article
Google Scholar

[138] View Article

[139] Google Scholar

[ref43] 43. AlJame M., Ahmad I., Imtiaz A., and Mohammed A., “Ensemble learning model for diagnosing COVID-19 from routine blood tests,” Informatics Med. Unlocked, vol. 21, p. 100449, 2020. pmid:33102686
View Article
PubMed/NCBI
Google Scholar

[141] View Article

[142] PubMed/NCBI

[143] Google Scholar

[ref44] 44. R. Shaaque, A. Mehmood, G. S. Choi, R. Shafique, and S. Ullah, “Cardiovascular Disease Prediction System Using Extra Trees Classiier Cardiovascular Disease Prediction System Using Extra Trees Classifier,” 2019.

[ref45] 45. Shrivastav L. K. and Jha S. K., “A gradient boosting machine learning approach in modeling the impact of temperature and humidity on the transmission rate of COVID-19 in India,” Appl. Intell., vol. 51, no. 5, pp. 2727–2739, 2021. pmid:34764559
View Article
PubMed/NCBI
Google Scholar

[146] View Article

[147] PubMed/NCBI

[148] Google Scholar

[ref46] 46. S. Tripath, “Gradient-Boosting Machine Model,” pp. 19–21.

[ref47] 47. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl, “Algorithms for hyper-parameter optimization,” Adv. Neural Inf. Process. Syst. 24 25th Annu. Conf. Neural Inf. Process. Syst. 2011, NIPS 2011, pp. 1–9, 2011.

[ref48] 48. Xia X., Jiang S., Zhou N., Li X., and Wang L., “Genetic algorithm hyper-parameter optimization using taguchi design for groundwater pollution source identification,” Water Sci. Technol. Water Supply, vol. 19, no. 1, pp. 137–146, 2019.
View Article
Google Scholar

[152] View Article

[153] Google Scholar

[ref49] 49. www.kaggle.com/marianarfranklin/mexico-covid19-clinical-data/

[ref50] 50. Thapa, Surendrabikram, Surabhi Adhikari, Awishkar Ghimire, and Anshuman Aditya. 2020. “Feature Selection Based Twin-Support Vector Machine for the Diagnosis of Parkinson’s Disease.” IEEE Region 10 Humanitarian Technology Conference, R10-HTC 2020-December(December).

[ref51] 51. Xiong Yibai et al. 2022. “Comparing Different Machine Learning Techniques for Predicting COVID-19 Severity.” Infectious Diseases of Poverty 11(1). 1–9.
View Article
Google Scholar

[157] View Article

[158] Google Scholar

[ref52] 52. D. Devetyarov, I. Nouretdinov, C. Based, and R. Forest, “Prediction with Confidence Based on a Random Forest Classifier To cite this version. HAL Id. hal-01060649 Prediction with Confidence Based on a Random Forest Classifier,” pp. 0–8, 2017.

[ref53] 53. Imam A. T., Alhroob A., and Alzyadat W. J., “SVM Machine Learning Classifier to Automate the Extraction of SRS Elements,” Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 3, pp. 174–185, 2021.
View Article
Google Scholar

[161] View Article

[162] Google Scholar

[ref54] 54. D. A. Pisner and D. M. Schnyer, Support vector machine. Elsevier Inc., 2019.

[ref55] 55. Rai N., Kaushik N., Kumar D., Raj C., and Ali A., “Mortality prediction of COVID-19 patients using soft voting classifier,” Int. J. Cogn. Comput. Eng., vol. 3, no. June, pp. 172–179, 2022.
View Article
Google Scholar

[165] View Article

[166] Google Scholar

[ref56] 56. Florea A. C. and Andonie R., “Weighted Random Search for hyperparameter optimization,” Int. J. Comput. Commun. Control, vol. 14, no. 2, pp. 154–169, 2019.
View Article
Google Scholar

[168] View Article

[169] Google Scholar

[ref57] 57. Haqmi Abas M. A., “Agarwood Oil Quality Classification using Support Vector Classifier and Grid Search Cross Validation Hyperparameter Tuning,” Int. J. Emerg. Trends Eng. Res., vol. 8, no. 6, pp. 2551–2556, 2020.
View Article
Google Scholar

[171] View Article

[172] Google Scholar

[ref58] 58. Ali Yasser A., Emad Mahrous Awwad Muna Al-Razgan, and Maarouf Ali. 2023. “Hyperparameter Search for Machine Learning Algorithms for Optimizing the Computational Complexity.” Processes 11(2).
View Article
Google Scholar

[174] View Article

[175] Google Scholar

[ref59] 59. Chieregato Matteo et al. 2022. “A Hybrid Machine Learning/Deep Learning COVID-19 Severity Predictive Model from CT Images and Clinical Data.” Scientific Reports 12(1). 1–15.
View Article
Google Scholar

[177] View Article

[178] Google Scholar

[ref60] 60. Muhammad L. J., Algehyne E. A., Usman S. S., Ahmad A., Chakraborty C., & Mohammed I. A. (2021). Supervised machine learning models for prediction of COVID-19 infection using epidemiology dataset. SN computer science, 2(1), 1–13. pmid:33263111
View Article
PubMed/NCBI
Google Scholar

[180] View Article

[181] PubMed/NCBI

[182] Google Scholar

[ref61] 61. Han X., Hu Z., Wang S., & Zhang Y. (2022). A survey on deep learning in COVID-19 diagnosis. Journal of imaging, 9(1), 1. pmid:36662099
View Article
PubMed/NCBI
Google Scholar

[184] View Article

[185] PubMed/NCBI

[186] Google Scholar

[ref62] 62. Bode B., Garrett V., Messler J., McFarland R., Crowe J., Booth R., & Klonoff D. C. (2020). Glycemic characteristics and clinical outcomes of COVID-19 patients hospitalized in the United States. Journal of diabetes science and technology, 14(4), 813–821. pmid:32389027
View Article
PubMed/NCBI
Google Scholar

[188] View Article

[189] PubMed/NCBI

[190] Google Scholar

[ref63] 63. Chadaga K., Prabhu S., Umakanth S., Bhat K., Sampathila N., & Chadaga R. (2021). COVID-19 mortality prediction among patients using epidemiological parameters: an ensemble machine learning approach. Engineered Science, 16(10), 221–233.
View Article
Google Scholar

[192] View Article

[193] Google Scholar

[ref64] 64. Becerra-Sánchez A., Rodarte-Rodríguez A., Escalante-García N. I., Olvera-González J. E., De la Rosa-Vargas J. I., Zepeda-Valles G., et al. (2022). Mortality analysis of patients with COVID-19 in Mexico based on risk factors applying machine learning techniques. Diagnostics, 12(6), 1396. pmid:35741207
View Article
PubMed/NCBI
Google Scholar

[195] View Article

[196] PubMed/NCBI

[197] Google Scholar

Figures

Abstract

1. Background and rationale

1.1 Motivation

1.2. Problem identification

1.3 Challenges and limitation of existing machine learning approaches in disease diagnosis

1.4 Research contributions

1.5 Structuring of the paper

2. Literature survey

3. Proposed model

Step I- Preprocessing

Step II- Data splitting

Step III- Classification algorithm

Step IV- Hyperparameter tuning

Step V- Building and model analysis

Step VI- Feature selection

Step VII- Ensemble model

Step VIII- Performance evaluation

4. Hyperparameter tuning using Genetic algorithm

Population initialization

Evaluation phase.

Selection.

Crossover (Recombination).

Mutation.

Population update.

Termination conditions.

Final result.

5. Dataset description

6. Experimental results and discussions

7. Feature importance using Explainable AI (SHAP analysis)

8. Conclusions and future works

Acknowledgments

References