Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Enhancing machine learning performance in cardiac surgery ICU: Hyperparameter optimization with metaheuristic algorithm

  • Ali Bahrami,

    Roles Conceptualization, Data curation, Methodology, Software, Validation, Visualization, Writing – original draft

    Affiliation School of Industrial Engineering, Iran University of Science and Technology, Tehran, Iran

  • Morteza Rakhshaninejad,

    Roles Conceptualization, Methodology, Software, Visualization, Writing – original draft

    Affiliation School of Industrial Engineering, Iran University of Science and Technology, Tehran, Iran

  • Rouzbeh Ghousi ,

    Roles Conceptualization, Methodology, Supervision, Writing – review & editing

    ghousi@iust.ac.ir

    Affiliation School of Industrial Engineering, Iran University of Science and Technology, Tehran, Iran

  • Alireza Atashi

    Roles Conceptualization, Data curation, Methodology, Resources, Validation, Writing – review & editing

    Affiliations Department of Digital Health, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran, Cancer Informatics Research Group, Clinical Research Department, Breast Cancer Research Center, Motamed Cancer Institute, ACECR, Tehran, Iran

Abstract

The healthcare industry is generating a massive volume of data, promising a potential goldmine of information that can be extracted through machine learning (ML) techniques. The Intensive Care Unit (ICU) stands out as a focal point within hospitals and provides a rich source of data for informative analyses. This study examines the cardiac surgery ICU, where the vital topic of patient ventilation takes center stage. In other words, ventilator-supported breathing is a fundamental need within the ICU, and the limited availability of ventilators in hospitals has become a significant issue. A crucial consideration for healthcare professionals in the ICU is prioritizing patients who require ventilators immediately. To address this issue, we developed a prediction model using four ML and deep learning (DL) models—LDA, CatBoost, Artificial Neural Networks (ANN), and XGBoost—that are combined in an ensemble model. We utilized Simulated Annealing (SA) and Genetic Algorithm (GA) to tune the hyperparameters of the ML models constructing the ensemble. The results showed that our approach enhanced the sensitivity of the tuned ensemble model to 85.84%, which are better than the results of the ensemble model without hyperparameter tuning and those achieved using AutoML model. This significant improvement in model performance underscores the effectiveness of our hybrid approach in prioritizing the need for ventilators among ICU patients.

1. Introduction

The development of biomedical equipment and healthcare services has enabled the Intensive Care Unit (ICU) to collect vast amounts of data. This advancement has led to a growing interest in analyzing this data for various purposes, such as improving patient care and predicting patient outcomes [1]. Early and accurate diagnosis is essential for reducing mortality rates. Accurately predicting the risk of death for patients awaiting heart surgery can provide crucial information, enabling life-saving interventions while also reducing costs and time. The immediate implementation of predictive mortality risk assessment is essential for heart surgery patients [2].

Common scoring systems, including APACHE (Acute Physiology and Chronic Health Evaluation) and SAPS (Simplified Acute Physiology Score), are frequently used to predict mortality in ICUs. Although these models are robust in certain contexts, their effectiveness, particularly of the APACHE II system in prolonged mechanical ventilation cases, is still debatable. Despite lingering uncertainty, the APACHE II score has emerged as a reliable predictor of mortality for patients undergoing weaning from prolonged mechanical ventilation [3]. According to Khwannimit et al. [4], a customized version of APACHE II surpasses a customized SAPS II in its accuracy for predicting in-hospital mortality. According to their findings, the tailored APACHE II may be used by ICUs with comparable patient profiles for efficient quality assessment and mortality prediction. Conversely, Poole et al. [5] observed that SAPS II and SAPS III were not good enough at predicting death, with SAPS III being a surprise in overestimating mortality compared to SAPS II. lacked precision in mortality prediction, with SAPS III unexpectedly overestimating mortality.

The integration of machine learning (ML) with traditional scoring systems marks a significant advancement in predicting outcomes like mortality and ventilation requirements, including weaning success and prolonged ventilation. According to Kong et al. [6] research, strong predictive skills were shown by the ML-based techniques developed. Interestingly, the model known as the Gradient Boosting Machine (GBM), proved to be the most effective in predicting the risk of in-hospital death. Advanced ML models, including Balanced Random Forest (BRF), Light Gradient Boosting Machine (LGBM), Extreme Gradient Boosting (XGB), Multilayer Perceptron (MLP), and Logistic Regression (LR), have demonstrated superior accuracy over traditional systems in predicting 30-day mortality for mechanically ventilated patients [7]. Mechanical ventilation, a critical intervention in ICUs, is typically administered through an endotracheal tube, known as invasive mechanical ventilation, to assist patients with severe respiratory problems [8].

One of the primary challenges in ICU care is the process of weaning patients off ventilators. Research has shown that ML techniques, including LR, XGBoost (Extreme Gradient Boosting), and Support Vector Machines (SVM) that utilize related variables, are effective in accurately predicting the outcomes of ventilator weaning [9]. ML models utilizing the XGBoost and CatBoost algorithms have demonstrated brilliant accuracy in predicting the need for mechanical ventilation and assessing the mortality risk in COVID-19 patients [10]. These models stand out for their precision in critical healthcare scenarios, particularly in managing the pandemic’s challenges.

In the realm of pediatric cardiac ICU, mechanical ventilation remains a cornerstone of treatment. This is especially true for patients with serious cardiovascular problems, where the weaning process from artificial breathing is a crucial and delicate part of their treatment [11]. Li et al. [12], highlighted the increased mortality risk associated with patients undergoing mechanical ventilation for congestive heart failure (CHF). They developed and validated a CatBoost model capable of accurately predicting hospital mortality in this patient group. Hsieh et al. [13] conducted a comparative study of various ML models against conventional metrics to predict the mortality rate of patients undergoing unscheduled extubation (UE) in ICU. Their findings revealed that the random forest (RF) model was particularly effective in predicting mortality among these patients.

Moreover, Meenen et al. [14] discovered a correlation between the mechanical power of ventilation, the driving pressure, and important patient outcomes like ventilation time and mortality. Their study focused on determining how well these features predicted mortality, particularly 24 hours after invasive breathing started. The development of effective natural ventilation systems for sustainable building designs is becoming more and more important. Park et al. [15] examined eight ML algorithms to predict natural ventilation rates. The interpretation of nonlinear correlations between environmental variables both indoors and outdoors is appropriate for these methods. Furthermore, Liang et al. [16] proposed an automated model using data from the MIMIC-III cohort to predict ventilator-associated pneumonia (VAP). The model showed impressive performance, with high scores in AUC (Area Under the Curve), sensitivity, and specificity metrics. Sayed et al. [17] attempted to determine the optimum early course of action within the first 48 hours in the ICU. They applied supervised machine learning techniques to estimate the duration of mechanical ventilation needed following the onset of acute respiratory distress syndrome (ARDS). Another study [18] utilized a variety of ML methods to predict the need for early treatments for Multiple Organ Dysfunction Syndrome (MODS) in the ICU, based on data regarding ventilator use.

Several studies have aimed to predict extubation failure, intubation, successful ventilator mode shifting, or prolonged mechanical ventilation in the ICU. These studies have achieved good results [1924]. Last but not least, Shashikumar et al. [25] developed a transparent DL strategy to predict hospitalized patients’ need for mechanical ventilation, including those with COVID-19, using publicly available data from electronic health records.

Bahrami et al [26], presented a two-stage hybrid model based on combination of ML and expert opinion to predict the patients’ need for ventilators in the ICU of cardiac surgery and then prioritized patients to assign limited ventilator to critically patients first.

Despite significant advances, few studies have specifically focused on predicting the need for ventilators in cardiac surgery ICU patients. Our study aims to address this gap by employing ML techniques to forecast the necessity of ventilators for these patients. This predictive capability is especially vital during pandemics or periods of increased ICU admissions, where efficient resource allocation can be lifesaving. To enhance the performance and reliability of ML models, it is crucial to conduct hyperparameter tuning. This process is essential for optimizing the functionality of ML algorithms.

The use of metaheuristic algorithms for selecting optimal parameters of ML models is a common and effective approach [27]. Some popular algorithms in this domain include Genetic Algorithms (GA) [28], Simulated Annealing (SA) [29], and Ant Colony Optimization (ACO) [27]. Among all these methods, the SA metaheuristic algorithm for hyperparameter tuning due to its proven effectiveness in navigating complex solution spaces and efficiently identifying optimal parameters in the healthcare systems is better option. This choice is grounded in SA’s unique ability to avoid local optima, a common challenge in model optimization, thereby ensuring a more robust and accurate performance of our ML models in critical healthcare settings [29].

We opted for metaheuristic algorithms such as SA and GA over traditional methods like cross-validation and grid search [30] due to their superior efficiency in navigating complex and high-dimensional parameter spaces. These algorithms excel at finding optimal solutions more efficiently than exhaustive search methods such as grid search [31], which becomes computationally impractical as the number of hyperparameters increases. Unlike grid search, metaheuristics can escape local optima and explore the solution space more globally, which is crucial for achieving optimal model performance [32]. Additionally, we reviewed these models as presented in Table 1 of this study.

In This study, to improve our previous work [26] and to improve performance of ML and DL algorithms including linear discriminant analysis (LDA), CatBoost, and ANN while predicting necessity of ventilators for patients, we employed the SA for hyperparameter tuning. This strategy is pivotal in optimizing model parameters, enhancing evaluation metrics, and preventing overfitting, thus improving predictive accuracy in critical healthcare settings. Table 1 presents a summarized overview of research studies focused on predicting mortality and ventilation-related cases in ICU patients. Our study introduces five key innovations:

  • A hybrid model combining ML and DL models with metaheuristic algorithm.
  • The application of ML and DL models to forecast ventilator needs in the cardiac surgery ICU.
  • The use of RF and GBM for calculating the importance of features.
  • Hyperparameter tuning with the SA and GA metaheuristic algorithm.
  • Formation of an ensemble model using weighted voting to enhance prediction of ventilator needs.

The rest of this study is structured as follows: Section 2 demonstrates information about the dataset and the preprocessing techniques utilized in this study. Also in section 2, we present the Hybrid model and the model evaluation. Section 3 shows the results and discussion, and finally the conclusions of this study and suggestions for future endeavors are outlined in Section 4.

2. Materials and methods

An essential component of every research project is comprehending the issue. This study intends to develop a hybrid model to predict hospitalized patients undergoing cardiac surgery ICU’s requirement for ventilators. We take this action to avoid the hazards to life and finances associated with delayed ventilator assignment for patients. Thus, in response to this challenge, we present our hybrid model as a potential solution. It’s noteworthy that the majority of the coding for this research was implemented in Python version 3.9.13. Also, entirely practical experiment was run on an Intel Core i7 Laptop running at 2.40 GHz with 16GB of RAM. The methodology utilized to achieve the objectives of this study is depicted in Fig 1.

2.1. Dataset

For this study, the dataset was meticulously gathered with ethical considerations from all participants at a hospital associated with Shahid Beheshti University of Medical Sciences and Health Services in Iran [2], and written consent was also obtained from all participants. The data collection, undertaken with due regard for ethical standards under the ethical code IR.ACER.IBCRC.REC.1394.71, was accessed for research purposes on March 7, 2016. The data, which focuses on patients who were admitted to the ICU, includes data obtained both during and after heart surgery procedures. The dataset contains 1,098 records and 32 features, of which 31 are critical for predicting ventilator use, with the “On_pump” feature designated as the key response variable. Table 2 reveals the main features of the dataset in detail. The subsequent section will elaborate on the data preprocessing methods employed in this study.

thumbnail
Table 2. Main features of our dataset of cardiac surgery ICUs.

https://doi.org/10.1371/journal.pone.0311250.t002

2.2. Preprocessing

Data preprocessing is a crucial and indispensable step in enhancing the development of ML models. Data preparation, the third phase of the CRISP-DM framework, is a critical stage of data mining projects, consuming a considerable portion of the overall time and effort. Thus, the quality and nature of the data are paramount for the success of the ML modeling process. To optimize the effectiveness of our modeling, we undertake five detailed steps of data preprocessing, which are crucial for the integrity of our analysis. The following are the steps we will implement in this section:

  • Handling missing values and applying feature importance
  • Utilizing one-hot encoding for categorical variable
  • Partitioning the data into train, validation, and test sets
  • Improving imbalanced dataset by using SMOTE method
  • Using the Z-Score in order to standardize the data

2.2.1. Handling missing values and applying feature importance.

Our initial dataset comprised 32 columns (31 independent features and 1 dependent feature), as detailed in Table 2. The presence of null values does not necessarily indicate data missingness; for instance, null values in binary columns often represent the value 0. To ensure accurate interpretation, we consulted a medical expert familiar with the data collection process. Several features were identified as redundant or non-essential and were subsequently removed to refine the dataset. The ’Code’ feature, representing patient IDs, was removed as it did not contribute to our analysis. Similarly, the ’Valve’ feature was removed as it duplicated information found in the ’CABG_Valve’ feature, and the ’Off_pump’ feature, having values opposite to those of ’On_pump’, was also eliminated.

Furthermore, certain features exhibited a high percentage of missing values, prompting specific actions: ’DURATION’ and ’CHF’, with more than 95% missing values, were removed to preserve the validity and reliability of our model. For features ’PH’, ’MG’, and ’MAP’, which had more than 15% but less than 50% missing values, we employed the K-Nearest Neighbors (KNN) imputation method. This method was crucial in preserving the underlying data structure and maintaining feature integrity by leveraging the similarity between data points. Additionally, six samples that lacked any values were removed.

Upon addressing these issues, 26 of the initial 31 independent features and 1092 of the 1098 samples were retained for further analysis. We also conducted a feature importance analysis using both RF and GBM models, as illustrated in Fig 2. This analysis was instrumental in identifying and retaining the most informative features, using a threshold importance score of 0.001. Only features with significant predictive power were included in further analyses. The intersection of important features identified by these models, as shown in Fig 2, included ’Arrythmia’, ’Number_of_Grafts’, ’Addiction’, ’Arrest’, ’Hight’, ’Weight’, ’MAP’, ’MG’, ’Grade_of_Age’, ’Age’, ’CABG_Valve’, ’EF’, ’IABP’, ’Cross_Clamp’, ’Smoker’, ’MI’, ’Albumin’. These features were carefully considered to ensure robustness in our feature set.

thumbnail
Fig 2. Feature importance with Random Forest and Gradient Boosting.

https://doi.org/10.1371/journal.pone.0311250.g002

2.2.2. Utilizing one-hot encoding.

To incorporate categorical variables into ML models, it’s advisable to transform them into binary variables using one-hot encoding [10, 22]. As an instance, a categorical feature like sex, which can only be female or male, cannot be directly incorporated into a ML model; This technique transforms the Sex column into a binary column, where male is represented by 1 and female by 0.

2.2.3. Partitioning the data into train, validation, and test sets.

To mitigate the risk of overfitting in ML models, we split the dataset into separate training, validation and testing sets [7]. The primary objective of this data partitioning is to evaluate the model’s performance on unseen data, thereby ensuring its generalizability. The training set is used initially to develop the models, while the validation set is applied for fine-tuning the hyperparameters of these models using metaheuristic algorithms. Finally, we evaluate their performance on the test set, which provides insights into their general applicability. To determine the most effective model, we evaluated their performance on various metrics. To achieve this, we divided the data into a 60-20-20 split, with 60% dedicated to training, 20% reserved for validation, and 20% for testing. Specifically, of the 1092 total records, 655 were utilized to train the models, 218 for validation, and 219 to test their performance. This methodical approach allows for a comprehensive assessment of each model’s ability to generalize beyond the training data.

2.2.4. Improving imbalanced data by using SMOTE method.

To address potential biases introduced by imbalanced data in our study, we employed the Synthetic Minority Over-sampling Technique (SMOTE). This method is particularly effective in balancing datasets by synthesizing new samples for the under-represented class [1]. By using SMOTE, we enhanced the dataset’s diversity and improved the generalizability of our models. SMOTE works by selecting data points that are close in the feature space, drawing a line between the points in the feature space, and creating a new point along that line. This approach is crucial for creating a balanced dataset and avoiding model bias towards the majority class.

SMOTE was applied exclusively to the training data to balance it. Initially, the training data had an imbalance between the two classes: class 0 (patients who do not need a ventilator) and class 1 (patients who need a ventilator). After applying SMOTE to the training data, we achieved a balanced distribution with 414 samples in each class, totaling 828 samples. In contrast, the validation set, which did not undergo SMOTE, contains 142 samples that do not need a ventilator and 76 that do, reflecting the original distribution of the dataset. Similarly, the test set has 154 samples that do not need a ventilator and 65 that do. This strategic application of SMOTE ensures that our model training is robust and fair while allowing us to assess model performance against the more naturally imbalanced conditions seen in the validation and test data.

2.2.5. Using the Z-score to standardize the data.

Feature scaling or standardization, a critical step in data preprocessing, is essential when dealing with datasets where features have varying scales or are measured in distinct units. The discrepancies in feature ranges can hinder the effectiveness of many ML models. For instance, in algorithms that use distance-based calculations, a feature with a significantly larger range will exert an undue influence on the distance calculations.

The Z-score is a popular approach for standardizing data. It entails subtracting the mean of each feature and then dividing by its standard deviation [1].

(1)

The Z-scores for each sample are calculated in Eq (1) as you can see above.

Upon completing the standardization procedure, all features will be centered at zero and have a standard deviation of one, ensuring a consistent scale. This standardization is performed separately for the train, validation, and test sets. Standardizing these sets separately safeguards against data leakage, a phenomenon where information from the validation and test data infiltrates the modeling process.

2.3. Modeling

Previously, we highlighted the significance of patient ventilation in the ICU setting. Mechanical ventilation, which provides artificial respiration, is a crucial requirement for ICU patients, particularly in light of the limited availability of ventilators within hospitals. Healthcare workers strive to distinguish patients who require ventilators with the highest level of urgency. To address this challenge, we developed a hybrid ensemble predictive model to predict the need for ventilator among patients in the cardiac surgery ICU.

To address the critical need for ventilators in cardiac surgery ICU and prevent life-threatening consequences due to delayed ventilator allocation, we propose our ensemble model to accurately predict ventilator requirements as below:

2.3.1. Machine learning & metaheuristic models.

In the first stage of our ensemble model, we implemented four classification models to predict the necessity of ventilators for patients in the cardiac surgery ICU. This involved the use of supervised ML and DL models, specifically LDA, CatBoost, ANN, and XGBoost, as provided in the Scikit-learn, CatBoost, Keras, and xgboost libraries of Python. Each of these classifiers was trained using the training dataset to assess their performance differences.

We chose CatBoost for its ability to handle categorical features without extensive preprocessing and its efficiency in both regression and classification tasks [33]. LDA was selected for its strength in dimensionality reduction and class distinction based on linear combinations of features, which is vital for medical datasets [34]. ANN was chosen for its capability to model non-linear relationships in medical data through deep learning architectures [35]. XGBoost was included due to its proven efficacy and recommendation by the TPOT AutoML algorithm [36], enhancing the ensemble’s performance. These models were validated against empirical evidence, ensuring their superior performance in metrics such as Accuracy, Precision, Sensitivity, Specificity, and F1-score, as documented in [26] study. We reviewed related studies, as presented in Table 1, to support our model selection.

After this, we need to do hyperparameter tuning to determine which parameters of our classifiers are good to be used in the ensemble model to either improve evaluation metrics or avoid overfitting. Hyperparameters in ML are parameters that values are predetermined before the learning process starts [12]. To optimize these parameters, we employed the SA metaheuristic algorithm, a widely recognized method for its efficiency in navigating and optimizing complex parameter spaces. This approach was chosen for its ability to effectively balance the exploration and exploitation of the parameter space, thereby improving evaluation metrics and reducing the risk of overfitting.

The ANN model underwent parameter adjustments including the number of layers, units per layer, activation functions, optimizer type, loss function, batch size, and epochs. For the LDA model, parameters like solver type, number of components, store covariance option, tolerance level, and shrinkage were fine-tuned. The CatBoost model’s tuning focused on parameters such as learning rate, tree depth, L2 leaf regularization, and the number of iterations. The tuning of the XGBoost model included hyperparameters such as C, dual, loss, penalty, tol, learning rate, max depth, max features, min samples leaf, min samples split, n_estimators, and subsample. Each of these hyperparameters was carefully selected based on recommendations from developer documentation and insights from related studies, leading to different numbers of parameters being optimized for each model. This variance is inevitable due to the unique characteristics and capabilities of each model, ensuring each is optimized to enhance performance and robustness against overfitting.

To effectively manage potential overfitting in predictive models, we employed a range of regularization techniques tailored to each model’s architecture. For the LDA model, we considered a shrinkage parameter to be optimized through SA and GA, effectively balancing model complexity and generalization. In CatBoost model, instead of traditional L1 or L2 regularization, we adjusted the l2_leaf_reg parameter to 3, based on preliminary tests which suggested this setting optimally prevents overfitting while maintaining model flexibility. For XGBoost, we applied L1 regularization with an alpha value of 0.01 to encourage feature selection and L2 regularization with a lambda value of 1.0 to penalize larger weights, thus enhancing model generalization across various datasets. Lastly, in the ANN, both L1 and L2 regularizations were set at 0.01 directly in the layers, and a combined L1_L2 regularization was employed to ensure a balance between reducing model complexity and retaining predictive accuracy.

These optimizations were captured in detailed logs, showing the best accuracy, precision, recall (Sensitivity), F1 score, specificity.

The hyperparameters set for each ML and DL model are summarized in Table 3. This table provides a comprehensive overview of the hyperparameters utilized during the tuning phase. The ranges for these hyperparameters were determined based on the developer documentation for each model, expert feedback tailored to our problem characteristics, and insights from other studies, which are referenced in the last column of Table 3. This approach ensures that the tuning is grounded in both theoretical best practices and practical insights relevant to our specific research context. Also, the below pseudo code (Algorithm 1) represents the process of hyperparameter tuning using SA:

thumbnail
Table 3. The hyperparameters set for each ML and DL model.

https://doi.org/10.1371/journal.pone.0311250.t003

  1. Algorithm 1. Pseudocode for Hyperparameter Selection using Simulated Annealing (SA)
  2.  1. Define the hyperparameter space for each model:
  3.   ANN_hyperparameters = [number_of_layers, units_per_layer, activation_functions, optimizer, loss_function, batch_size, epochs]
  4.   LDA_hyperparameters = [solver, number_of_components, store_covariance, tolerance, shrinkage]
  5.   CatBoost_hyperparameters = [learning_rate, depth, l2_leaf_reg, iterations]
  6.   XGBoost = [C, dual, loss, penalty, tol, learning_rate, max_depth, max_features, min_samples_leaf, min_samples_split, n_estimators, and subsample]
  7.  2. Initialize the SA algorithm parameters:
  8.   -initial_temperature (set initial temperature for SA) = 100
  9.   -cooling_rate (set cooling rate for the SA process) = 0.95
  10.   -max_iterations (set maximum number of iterations for SA) = 250
  11.  3. For each model (ANN, LDA, CatBoost, XGBoost):
  12.   -Initialize best_solution and best_score to None or initial values
  13.   -Set current_solution to a random selection from the model’s hyperparameter space
  14.   -Evaluate current_solution using the model’s evaluation function (e.g., evaluate_ann for ANN)
  15.   -Set best_solution to current_solution if it’s the first iteration or better than the existing best_solution
  16.   For iteration in range(max_iterations):
  17.    -Generate new_solution by slightly modifying current_solution’s parameters
  18.    -Evaluate new_solution using the model’s evaluation function
  19.    -Calculate acceptance_probability using SA algorithm’s acceptance criteria
  20.    -If acceptance_probability > random value between 0 and 1:
  21.    -Update current_solution to new_solution
  22.    -Update best_solution if new_solution is better than current_solution
  23.    -Reduce temperature based on cooling_rate
  24.   -Return final best_solution for the model
  25.  4. After completing SA for all models, select the best hyperparameters for each model.

To evaluate the predictive performance of the models, we need to measure their performance on the validation set for further tuning and the test set for final evaluation and comparisons. The ensemble model, constructed from the individually optimized models, is then tested to choose the best setup based on various evaluation metrics. We will discuss on ensemble model and various evaluation metrics in the next subsection.

2.3.2. Ensemble model construction.

In our study, we have developed an ensemble model that synthesizes the predictive power of four distinct base classifiers—CatBoost, LDA, ANN and XGBoost—each individually fine-tuned using metaheuristic optimization algorithms. This tuning process was essential for adjusting the hyperparameters specific to our data, maximizing the effectiveness of each model within the ensemble framework.

Our ensemble employs a weighted voting mechanism to integrate outputs from each base model. The weighting system is designed to allocate more influence to models that demonstrate superior performance on the validation set, enhancing the predictive accuracy and reliability of the ensemble’s overall output. This methodological choice is particularly effective in leveraging the distinct strengths of each classifier, thereby improving the ensemble’s capability to predict ventilator needs accurately.

The model training was conducted on 60% of our dataset, with the remainder split equally between validation and testing. The validation phase was crucial not only for hyperparameter tuning but also for determining the appropriate weights for each model within the ensemble. This setup ensures that our model is robust and well-adapted to the nuances of our dataset, minimizing the risk of overfitting.

Following the training and validation, we evaluated the ensemble model on the test set. This evaluation helped us assess the practical effectiveness of the ensemble in a controlled, yet realistic setting. We calculated key performance metrics such as accuracy, precision, sensitivity, specificity, and F1-score for each component model as well as for the ensemble as a whole. These metrics provided a comprehensive view of how each model contributes to the ensemble and how effectively the ensemble performs as a unit. The final prediction can be represented as: (2)

Where the parameter n represents the number of classifiers integrated within the ensemble. Each classifier, indexed by i, contributes to the final prediction with a speci fic weight wi, which is indicative of its importance or performance relative to the ensemble. The function fi(x) denotes the prediction output by the i-th classifier for the input x. Additionally, the function sign() is utilized to convert the aggregated weighted sum of predictions into a definitive class label. This conversion is dependent on a threshold, 0.5, which is commonly used in binary classification scenarios to determine the class labels. This ensemble not only capitalizes on the individual strengths of each model but also significantly enhances our ability to make accurate predictions about ventilator needs, which is critical for efficient ICU management and patient care.

2.3.3. Comparison.

In this study, we implemented the Genetic Algorithm (GA) alongside Simulated Annealing (SA) to conduct an in-depth comparison of these hyperparameter optimization techniques. Our goal was to evaluate their efficacy in tuning the hyperparameters of our base classifiers—LDA, CatBoost, Artificial Neural Network (ANN), and XGBoost. We utilized GA, with hyperparameters defined in Table 3, to find optimal parameter sets for each base classifier. This approach allowed us to directly compare the effectiveness of GA and SA in enhancing the performance of these models by adjusting their hyperparameters to achieve the best possible outcomes. The below is pseudo code (Algorithm 2) of hyperparameter tuning using GA:

  1. Algorithm 2. Pseudocode for Hyperparameter Selection using Genetic Algorithm (GA)
  2. 1. Define the hyperparameter space for each model:
  3. ANN_hyperparameters = [number_of_layers, units_per_layer, activation_functions, optimizer, loss_function, batch_size, epochs]
  4. LDA_hyperparameters = [solver, number_of_components, store_covariance, tolerance, shrinkage]
  5. CatBoost_hyperparameters = [learning_rate, depth, l2_leaf_reg, iterations]
  6. XGBoost = [C, dual, loss, penalty, tol, learning_rate, max_depth, max_features, min_samples_leaf, min_samples_split, n_estimators, and subsample]
  7. 2. For each model (ANN, LDA, CatBoost, XGBoost):
  8.  - Generate an initial population of 20 individuals.
  9.  - For each individual in the population:
  10.   - Calculate the model’s performance (F1-score) on validation set.
  11.  - For 250 generations:
  12.   - Selection:
  13.   - Select the top 10 performing individuals from the population to serve as parents for the next generation.
  14.   - Crossover:
  15.   - For each pair of parents, perform a two-point crossover to produce offspring. This involves selecting two random crossover points in the hyperparameter list and swapping the segments between these points to create new offspring.
  16.   - Mutation:
  17.   - Apply mutations with a 0.1 mutation rate to the offspring. Randomly alter one or more hyperparameters within their defined ranges. The standard mutation involves randomly selecting a hyperparameter and changing it to another valid value from its range.
  18.   - Evaluate new generation:
  19.   - Assess the fitness of each new individual in the population using F1-score.
  20.   - Select the best individual:
  21.   - After all generations have been processed, identify the individual with the highest fitness score as possessing the optimal set of hyperparameters.
  22. 3. Output the best hyperparameter set:
  23.  - Return the best hyperparameters and their associated performance metrics.

We assessed the performance of each base classifier when their hyperparameters were not tuned using SA and GA, as well as the performance of an ensemble model composed of these untuned classifiers. These initial results are presented in Table 4. Further, we analyzed the performance of each base classifier after their hyperparameters were tuned with both GA and SA. Additionally, we evaluated an ensemble model that was comprised of classifiers tuned with these methods. This allowed us to directly compare the impact of hyperparameter tuning on the performance of individual classifiers and their collective performance within an ensemble framework. The results of this analysis are detailed in Table 6, showcasing the effectiveness of each tuning method.

thumbnail
Table 4. Evaluation metrics of ML and DL models before hyperparameter tuning using SA algorithm.

https://doi.org/10.1371/journal.pone.0311250.t004

To provide a broader perspective and benchmark our results against current automated techniques, we incorporated AutoML that named TPOT into our study [36]. This AutoML simplifies the selection of ML models and hyperparameter tuning but typically limits this tuning to a narrow range of parameters. We conducted a detailed comparison between the performance of the AutoML-selected model and our ensemble models, which were tuned using SA and GA, and also evaluated them in their untuned states. This analysis, displayed in Table 7, highlights the performance distinctions and demonstrates how our tuning approaches potentially offer more robust customization compared to AutoML’s more generic methodology.

While AutoML requires less technical knowledge and provides a streamlined, efficient process suitable for general applications, it lacks the granular control over hyperparameter settings that SA and GA provide. This control is crucial for addressing the specific needs of complex datasets, like ours, where flexibility in the tuning process is essential. The limited hyperparameter tuning range of AutoML compared to our extensive range may also impact the depth of model optimization achievable. Our study discusses these differences, emphasizing the trade-offs between the ease of AutoML and the detailed control offered by SA and GA, which can significantly influence model performance and transparency in research settings.

This comprehensive evaluation strategy not only highlighted the strengths and weaknesses of hyperparameter tuning methods like GA and SA but also underscored the potential of AutoML as a viable alternative in scenarios where manual tuning may be impractical. By comparing these methods across a range of performance metrics, we gained valuable insights into their applicability and effectiveness in a critical healthcare setting, providing essential guidance for future implementations of ML technologies where prediction accuracy and model reliability are crucial.

2.3.4. Evaluation.

We utilized five classification metrics to evaluate the predictive capabilities of the developed ensemble, ML and DL models: accuracy, precision, recall (sensitivity), specificity, and F1-score. To ensure the robustness and reliability of our evaluation, we implemented 10-fold cross-validation at each step of performance assessment. This methodological approach allows us to generate more stable and generalizable performance estimates by averaging results across different subsets of the data.

Accuracy, the ratio of the number of correct predictions to the total number of predictions [8]. This assessment utilizes True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN) to calculate the model’s accuracy, which is defined as follows: (3)

The formula for precision represents the proportion of positive predictions that were indeed correct.

(4)

The recall (sensitivity) formula is given below. It represents the percentage of actual positive cases that were correctly identified.

(5)

The specificity formula is provided below. It indicates the proportion of actual negative cases that were correctly classified.

(6)

The trade-off relationship between precision and recall can result in a model performing well on one metric but poorly on the other. The F1-score addresses this issue by considering both metrics simultaneously [8]. It is presented as below: (7)

The sensitivity index stands out as a key evaluation tool in our research due to its emphasis on the FN error, which is a critical factor given the importance of ensuring that patients who need ventilators are accurately identified. To avoid putting patients’ lives at risk due to inaccurate predictions, we aim to minimize the FN error and, consequently, maximize the sensitivity of our model.

Evaluation metrics of ML and DL models before and after using SA and GA algorithms are represented in Tables 4 and 6 respectively, all calculated using 10-fold cross-validation. As you can see in these tables, the best model based on its metrics is ensemble model tuned with SA.

So, based on our comprehensive evaluation of different models, we determined that ensemble with SA achieved the best performance in predicting ventilator requirements for patients in cardiac surgery ICU.

3. Results and discussion

To demonstrate the effectiveness of optimization algorithms in enhancing evaluation metrics, we utilized both SA and GA for hyperparameter tuning across our ML and DL models. Table 4 showcases the evaluation metrics of these models before any hyperparameter tuning was applied. Notably, the ensemble model and XGBoost emerged as frontrunners, with the ensemble achieving the highest scores across all metrics—accuracy, precision, sensitivity, specificity, and F1-score all at 0.816901. In contrast, XGBoost displayed strong performance with the highest sensitivity of 0.809859 among the base classifiers and an impressive F1-score of 0.804196. Table 5 subsequently details the optimized hyperparameters for models such as ANN, LDA, CatBoost, and XGBoost, achieved using both SA and GA, setting the stage for a potential enhancement in their respective performance metrics.

thumbnail
Table 5. ML and DL models with their specific hyperparameters’ settings.

https://doi.org/10.1371/journal.pone.0311250.t005

Table 6 presents a clear visualization of performance improvements following the hyperparameter tuning with SA and GA across various models and metrics. Notably, the ensemble model shows distinct enhancements when tuned with SA, achieving superior accuracy, precision, sensitivity, specificity, and F1-score compared to when tuned with GA. For instance, the SA-tuned ensemble achieved a sensitivity of 0.858491, noticeably higher than the 0.823944 seen with GA tuning. This trend is consistent with the improvements observed when comparing the untuned ensemble, which displayed uniform metrics across all categories at 0.816901, demonstrating substantial gains particularly in sensitivity and F1-score after tuning, highlighting the effectiveness of SA in balancing performance metrics.

thumbnail
Table 6. Evaluation metrics of ML and DL models after hyperparameter tuning using SA algorithm.

https://doi.org/10.1371/journal.pone.0311250.t006

In individual base classifiers, the ANN model’s sensitivity improved significantly from 0.690141 in its untuned state to 0.774648 with SA, outperforming the improvement to 0.711268 with GA. This pattern indicates that SA is particularly effective for ANN in enhancing sensitivity. Conversely, XGBoost also benefitted from SA tuning, which elevated its sensitivity to 0.823944 compared to 0.802817 with GA, reinforcing SA’s suitability for this model. Meanwhile, CatBoost’s sensitivity enhanced from an initial 0.760563 to 0.795775 with SA and to 0.781690 with GA, with SA again proving more efficacious.

A closer examination of LDA reveals a slight decrease in sensitivity from 0.725352 to 0.711268 post-tuning, suggesting minimal impact from both SA and GA in this instance. However, across all models, SA generally yielded higher accuracy, precision, and notably, F1-scores, which represent the balance between precision and sensitivity. This is particularly crucial in clinical applications where accurately identifying patients needing ventilators is paramount to prevent adverse outcomes.

The overarching trends from this comprehensive evaluation clearly illustrate that hyperparameter tuning, especially with SA, significantly elevates the performance of ensemble and base classifiers in predicting ventilator requirements in the ICU, with SA typically outperforming GA in optimizing key metrics critical for clinical decision-making.

Table 7 compares the results of the untuned ensemble model, as shown in Table 4, with those of the ensemble models whose base classifiers were tuned using SA and GA, and also with an AutoML approach. This table illustrates that the SA-tuned ensemble achieved the highest scores in accuracy, precision, and F1-score, surpassing those of the GA-tuned ensemble and AutoML, indicating superior performance in balancing sensitivity and specificity.

thumbnail
Table 7. Comparison of tuned ensemble using SA and GA with untuned ensemble and AutoML.

https://doi.org/10.1371/journal.pone.0311250.t007

The optimization plots for the ANN, LDA, CatBoost, and XGBoost models using both SA and GA, depicted in Figs 3 and 4, show that the optimization processes for both algorithms reach a steady state, ensuring that the accuracy indices remain stable and do not degrade over time. These figures provide a visual confirmation of the algorithms’ efficacy in maintaining robust model performance throughout the tuning process.

thumbnail
Fig 3. Optimization plot of ML models using SA algorithm.

https://doi.org/10.1371/journal.pone.0311250.g003

thumbnail
Fig 4. Optimization plot of ML models using GA algorithm.

https://doi.org/10.1371/journal.pone.0311250.g004

Through a comprehensive evaluation, it was determined that the ensemble model that its base classifiers tuned with SA achieved the best overall performance in predicting ventilator requirements for patients in the cardiac surgery ICU, showcasing an exemplary use of hyperparameter optimization to enhance model reliability and effectiveness.

As we conclude, we acknowledge that our research, which uses data from a single hospital due to data gathering limitations, might affect the generalizability of our findings. Despite employing data partitioning and 10-fold cross-validation to improve reliability, further external validation is necessary. Future studies will aim to incorporate data from multiple hospitals to bolster the robustness and applicability of our model in diverse clinical settings.

After handling missing values, we conduct correlation analysis on the features. We calculate Pearson’s correlation coefficient for each feature against all other features [22]. The visualization of the correlation analysis is provided in Fig 5.

As shown in Fig 5, some features have a high correlation with the response variable (both positive and negative), such as the columns: “CABG_Valve”, “Cross_clamp”, “number_of_grafts”, “Age”, “Weight”, and etc. In related studies, the significance of selected features—such as “CABG_Valve” [64], “Cross_clamp” [65], “Age” [66] and “Weight” [67]—has been a topic of discussion. These features play a crucial role in various contexts, including cardiac surgery outcomes and patient care.

Also, when we compare these highly correlated variables with the results from the feature importance section (obtained by RF and GBM), it is clear that some features are also considered as highly important features in the feature importance results. This commonality between the results indicates that in addition to having a high correlation with the response variable, they have a significant effect in predicting the response variable. Therefore, it is important to know and pay attention to these important clinical features. For example, Visualizing data related to ‘CABG_Valve’ and ‘Cross_clamp’ (Fig 6) reveals a high proportion of ventilated patients in class 1 for these features. Also As observed in the Fig 7, with increasing age group, the EF trend decreases. In other words, based on this dataset and these graphs, it can be seen that the blood exchange between the atrium and ventricle of the heart decreases on average as the age group becomes higher.

thumbnail
Fig 6. Countplot of ’CABG_Valve’ and ’Cross_Clamp’ Based on ’On_Pump’.

https://doi.org/10.1371/journal.pone.0311250.g006

thumbnail
Fig 7. Boxplot of ’EF’ versus ’Grade of Age’ Stratified by ’Sex’.

https://doi.org/10.1371/journal.pone.0311250.g007

This also shows the importance of these clinical features in the cardiac surgery ICU and can help clinical staff to pay attention more to this matter. Consequently, analyses like the findings obtained in this study can significantly impact patient care and the management of healthcare resources.

4. Conclusions

In the realm of healthcare, a substantial amount of data is being generated, offering a potential goldmine of insights that can be unearthed using ML and DL techniques. The cardiac surgery ICU holds a distinct position within hospitals, providing ample data for valuable analyses. This study introduces a hybrid model combining ML and DL with a metaheuristic algorithm for predicting ventilator needs in cardiac surgery ICUs, addressing the urgent need to manage limited ventilator resources and prevent the life-threatening and financial consequences of delayed ventilator allocation to critical patients. We propose this model as a timely solution to this critical challenge.

Our analysis revealed that the ensemble model, with its base classifiers tuned using the SA optimization algorithm, achieved superior performance compared to other models. Furthermore, the results indicated that the metaheuristic algorithm, such as SA and GA, effectively enhanced the model’s predictive performance. By leveraging these techniques, we can effectively distinguish between patients admitted to the cardiac surgery ICU who require ventilator assistance and those who do not, enabling medical staff to make informed decisions in critical situations, balancing resource constraints and patient condition.

However, our study does have limitations. The dataset used is from a single center, which may limit the generalizability of our findings. Future work should include validating our model on multi-center datasets to ensure broader applicability. Additionally, further research could explore the use of alternative metaheuristic algorithms or innovative approaches to optimize the hyperparameters of ML and DL models for improved evaluation metrics. Exploring and incorporating alternative feature selection methods could also further optimize the model’s performance.

References

  1. 1. Ghorbani R, Ghousi R, Makui A, Atashi A. A New Hybrid Predictive Model to Predict the Early Mortality Risk in Intensive Care Units on a Highly Imbalanced Dataset. IEEE Access. 2020;8: 141066–141079.
  2. 2. Ghavidel A, Ghousi R, Atashi A. An ensemble data mining approach to discover medical patterns and provide a system to predict the mortality in the ICU of cardiac surgery based on stacking machine learning method. Comput Methods Biomech Biomed Eng Imaging Vis. 2022; 1–11.
  3. 3. Rojek-Jarmuła A, Hombach R, Krzych ŁJ. APACHE II score predicts mortality in patients requiring prolonged ventilation in a weaning center. Anaesthesiol Intensive Ther. 2016;48: 215–219. pmid:27595745
  4. 4. Khwannimit B, Bhurayanontachai R. The performance of customised APACHE II and SAPS II in predicting mortality of mixed critically ill patients in a Thai medical intensive care unit. Anaesth Intensive Care. 2009;37: 784–790. pmid:19775043
  5. 5. Poole D, Rossi C, Latronico N, Rossi G, Finazzi S, Bertolini G. Comparison between SAPS II and SAPS 3 in predicting hospital mortality in a cohort of 103 Italian ICUs. Is new always better? Intensive Care Med. 2012;38: 1280–1288. pmid:22584793
  6. 6. Kong G, Lin K, Hu Y. Using machine learning methods to predict in-hospital mortality of sepsis patients in the ICU. BMC Med Inform Decis Mak. 2020;20. pmid:33008381
  7. 7. Kim JH, Kwon YS, Baek MS. Machine Learning Models to Predict 30-Day Mortality in Mechanically Ventilated Patients. J Clin Med. 2021;10: 2172. pmid:34069799
  8. 8. Jia Y, Kaul C, Lawton T, Murray-Smith R, Habli I. Prediction of weaning from mechanical ventilation using Convolutional Neural Networks. Artif Intell Med. 2021;117. pmid:34127233
  9. 9. Chen W-T, Huang H-L, Ko P-S, Su W, Kao C-C, Su S-L. A Simple Algorithm Using Ventilator Parameters to Predict Successfully Rapid Weaning Program in Cardiac Intensive Care Unit Patients. J Pers Med. 2022;12: 501. pmid:35330500
  10. 10. Yu L, Halalau A, Dalal B, Abbas AE, Ivascu F, Amin M, et al. Machine learning methods to predict mechanical ventilation and mortality in patients with COVID-19. PLoS One. 2021;16: e0249285. pmid:33793600
  11. 11. Rooney SR, Reynolds EL, Banerjee M, Pasquali SK, Charpie JR, Gaies MG, et al. Prediction of extubation failure in the paediatric cardiac ICU using machine learning and high-frequency physiologic data. Cardiol Young. 2022;32: 1649–1656. pmid:34924086
  12. 12. Li L, Zhang Z, Xiong Y, Hu Z, Liu S, Tu B, et al. Prediction of hospital mortality in mechanically ventilated patients with congestive heart failure using machine learning approaches. Int J Cardiol. 2022;358: 59–64. pmid:35483478
  13. 13. Hsieh MH, Hsieh MJ, Chen C-M, Hsieh C-C, Chao C-M, Lai C-C. Comparison of machine learning models for the prediction of mortality of patients with unplanned extubation in intensive care units. Sci Rep. 2018;8: 17116. pmid:30459331
  14. 14. van Meenen DMP, Serpa Neto A, Paulus F, Merkies C, Schouten LR, Bos LD, et al. The predictive validity for mortality of the driving pressure and the mechanical power of ventilation. Intensive Care Med Exp. 2020;8. pmid:33336298
  15. 15. Park H, Park DY. Comparative analysis on predictability of natural ventilation rate based on machine learning algorithms. Build Environ. 2021;195: 107744.
  16. 16. Liang Y, Zhu C, Tian C, Lin Q, Li Z, Li Z, et al. Early prediction of ventilator-associated pneumonia in critical care patients: a machine learning model. BMC Pulm Med. 2022;22: 250. pmid:35752818
  17. 17. Sayed M, Riaño D, Villar J. Predicting Duration of Mechanical Ventilation in Acute Respiratory Distress Syndrome Using Supervised Machine Learning. J Clin Med. 2021;10: 3824. pmid:34501270
  18. 18. Liu C, Yao Z, Liu P, Tu Y, Chen H, Cheng H, et al. Early prediction of MODS interventions in the intensive care unit using machine learning. J Big Data. 2023;10: 55. pmid:37193361
  19. 19. Vali M, Paydar S, Seif M, Sabetian G, Abujaber A, Ghaem H. Prediction prolonged mechanical ventilation in trauma patients of the intensive care unit according to initial medical factors: a machine learning approach. Sci Rep. 2023;13: 5925. pmid:37045979
  20. 20. Otaguro T, Tanaka H, Igarashi Y, Tagami T, Masuno T, Yokobori S, et al. Machine Learning for Prediction of Successful Extubation of Mechanical Ventilated Patients in an Intensive Care Unit: A Retrospective Observational Study. Journal of Nippon Medical School. 2021;88: JNMS.2021_88–508.
  21. 21. Cheng K-H, Tan M-C, Chang Y-J, Lin C-W, Lin Y-H, Chang T-M, et al. The Feasibility of a Machine Learning Approach in Predicting Successful Ventilator Mode Shifting for Adult Patients in the Medical Intensive Care Unit. Medicina (B Aires). 2022;58: 360. pmid:35334536
  22. 22. Zhao Q-Y, Wang H, Luo J-C, Luo M-H, Liu L-P, Yu S-J, et al. Development and Validation of a Machine-Learning Model for Prediction of Extubation Failure in Intensive Care Units. Front Med (Lausanne). 2021;8. pmid:34079812
  23. 23. Chen T, Xu J, Ying H, Chen X, Feng R, Fang X, et al. Prediction of Extubation Failure for Intensive Care Unit Patients Using Light Gradient Boosting Machine. IEEE Access. 2019;7: 150960–150968.
  24. 24. Arvind V, Kim JS, Cho BH, Geng E, Cho SK. Development of a machine learning algorithm to predict intubation among hospitalized patients with COVID-19. J Crit Care. 2021;62: 25–30. pmid:33238219
  25. 25. Shashikumar SP, Wardi G, Paul P, Carlile M, Brenner LN, Hibbert KA, et al. Development and Prospective Validation of a Deep Learning Algorithm for Predicting Need for Mechanical Ventilation. Chest. 2021;159: 2264–2273. pmid:33345948
  26. 26. Bahrami A, Ghousi R, Atashi A, Barzinpour F. Presenting a Two-Stage Hybrid Model for allocating advanced Ventilators using Machine Learning methods: a case study. IEEE Access. 2024; 1–1.
  27. 27. Ali Y, Awwad E, Al-Razgan M, Maarouf A. Hyperparameter Search for Machine Learning Algorithms for Optimizing the Computational Complexity. Processes. 2023;11: 349.
  28. 28. Mansoori A, Zeinalnezhad M, Nazarimanesh L. Optimization of Tree-Based Machine Learning Models to Predict the Length of Hospital Stay Using Genetic Algorithm. J Healthc Eng. 2023;2023: 1–14. pmid:36824405
  29. 29. Ahmed A, Al-Maamari M, Firouz M, Delen D. An Adaptive Simulated Annealing-Based Machine Learning Approach for Developing an E-Triage Tool for Hospital Emergency Operations. Information Systems Frontiers. 2023.
  30. 30. James Bergstra, Yoshua Bengio. Random search for hyper-parameter optimization. Journal of Machine Learning Research. 2012;13: 281–305.
  31. 31. Nematzadeh S, Kiani F, Torkamanian-Afshar M, Aydin N. Tuning hyperparameters of machine learning algorithms and deep neural networks using metaheuristics: A bioinformatics study on biomedical and biological cases. Comput Biol Chem. 2022;97: 107619. pmid:35033837
  32. 32. Alibrahim H, Ludwig SA. Hyperparameter Optimization: Comparing Genetic Algorithm against Grid Search and Bayesian Optimization. 2021 IEEE Congress on Evolutionary Computation (CEC). IEEE; 2021. pp. 1551–1559.
  33. 33. Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A. CatBoost: unbiased boosting with categorical features. Adv Neural Inf Process Syst. 2018;31.
  34. 34. Anowar F, Sadaoui S, Selim B. Conceptual and empirical comparison of dimensionality reduction algorithms (PCA, KPCA, LDA, MDS, SVD, LLE, ISOMAP, LE, ICA, t-SNE). Comput Sci Rev. 2021;40: 100378.
  35. 35. Shirwaikar RD, Acharya U D, Makkithaya K, M S, Srivastava S, Lewis U LES. Optimizing neural networks for medical data sets: A case study on neonatal apnea prediction. Artif Intell Med. 2019;98: 59–76. pmid:31521253
  36. 36. Le TT, Fu W, Moore JH. Scaling tree-based automated machine learning to biomedical big data with a feature set selector. Bioinformatics. 2020;36: 250–256. pmid:31165141
  37. 37. Radzi SFM, Karim MKA, Saripan MI, Rahman MAA, Isa INC, Ibahim MJ. Hyperparameter Tuning and Pipeline Optimization via Grid Search Method and Tree-Based AutoML in Breast Cancer Prediction. J Pers Med. 2021;11: 978. pmid:34683118
  38. 38. Zulfiqar M, Gamage KAA, Kamran M, Rasheed MB. Hyperparameter Optimization of Bayesian Neural Network Using Bayesian Optimization and Intelligent Feature Engineering for Load Forecasting. Sensors. 2022;22: 4446. pmid:35746227
  39. 39. Kalliola J, Kapočiūtė-Dzikienė J, Damaševičius R. Neural network hyperparameter optimization for prediction of real estate prices in Helsinki. PeerJ Comput Sci. 2021;7: e444. pmid:33977129
  40. 40. Pawlicki M, Kozik R, Choraś M. Artificial Neural Network Hyperparameter Optimisation for Network Intrusion Detection. 2019. pp. 749–760.
  41. 41. Ali H, Muthudoss P, Chauhan C, Kaliappan I, Kumar D, Paudel A, et al. Machine Learning-Enabled NIR Spectroscopy. Part 3: Hyperparameter by Design (HyD) Based ANN-MLP Optimization, Model Generalizability, and Model Transferability. AAPS PharmSciTech. 2023;24: 254. pmid:38062329
  42. 42. Parsa M, Mitchell JP, Schuman CD, Patton RM, Potok TE, Roy K. Bayesian Multi-objective Hyperparameter Optimization for Accurate, Fast, and Efficient Neural Network Accelerator Design. Front Neurosci. 2020;14. pmid:32848531
  43. 43. Cheng H-C, Ma C-L, Liu Y-L. Development of ANN-Based Warpage Prediction Model for FCCSP via Subdomain Sampling and Taguchi Hyperparameter Optimization. Micromachines (Basel). 2023;14: 1325. pmid:37512636
  44. 44. Ogundokun RO, Misra S, Douglas M, Damaševičius R, Maskeliūnas R. Medical Internet-of-Things Based Breast Cancer Diagnosis Using Hyperparameter-Optimized Neural Networks. Future Internet. 2022;14: 153.
  45. 45. Janhuaton T, Ratanavaraha V, Jomnonkwao S. Forecasting Thailand’s Transportation CO2 Emissions: A Comparison among Artificial Intelligent Models. Forecasting. 2024;6: 462–484.
  46. 46. Tsipi L, Karavolos M, Bithas P, Vouyioukas D. Machine Learning-Based Methods for Enhancement of UAV-NOMA and D2D Cooperative Networks. Sensors. 2023;23: 3014. pmid:36991727
  47. 47. Muhajir D, Akbar M, Bagaskara A, Vinarti R. Improving classification algorithm on education dataset using hyperparameter tuning. Procedia Comput Sci. 2022;197: 538–544.
  48. 48. Cioccia G, Pereira de Morais C, Babos DV, Milori DMBP, Alves CZ, Cena C, et al. Laser-Induced Breakdown Spectroscopy Associated with the Design of Experiments and Machine Learning for Discrimination of Brachiaria brizantha Seed Vigor. Sensors. 2022;22: 5067. pmid:35890747
  49. 49. Höhne J, Bartz D, Hebart MN, Müller K-R, Blankertz B. Analyzing neuroimaging data with subclasses: A shrinkage approach. Neuroimage. 2016;124: 740–751. pmid:26407815
  50. 50. Yang F, Mao KZ, Lee GKK, Tang W. Emphasizing Minority Class in LDA for Feature Subset Selection on High-Dimensional Small-Sized Problems. IEEE Trans Knowl Data Eng. 2015;27: 88–101.
  51. 51. Imani M, Arabnia HR. Hyperparameter Optimization and Combined Data Sampling Techniques in Machine Learning for Customer Churn Prediction: A Comparative Analysis. Technologies (Basel). 2023;11: 167.
  52. 52. Asif D, Bibi M, Arif MS, Mukheimer A. Enhancing Heart Disease Prediction through Ensemble Learning Techniques with Hyperparameter Optimization. Algorithms. 2023;16: 308.
  53. 53. Peng Y, Liu Y, Wang J, Li X. A Novel Framework for Risk Warning That Utilizes an Improved Generative Adversarial Network and Categorical Boosting. Electronics (Basel). 2024;13: 1538.
  54. 54. Luo M, Wang Y, Xie Y, Zhou L, Qiao J, Qiu S, et al. Combination of Feature Selection and CatBoost for Prediction: The First Application to the Estimation of Aboveground Biomass. Forests. 2021;12: 216.
  55. 55. Almars AM, Alwateer M, Qaraad M, Amjad S, Fathi H, Kelany AK, et al. Brain Cancer Prediction Based on Novel Interpretable Ensemble Gene Selection Algorithm and Classifier. Diagnostics. 2021;11: 1936. pmid:34679634
  56. 56. Ogar VN, Hussain S, Gamage KAA. Transmission Line Fault Classification of Multi-Dataset Using CatBoost Classifier. Signals. 2022;3: 468–482.
  57. 57. Khan MA, Asad B, Vaimann T, Kallaste A, Pomarnacki R, Hyunh VK. Improved Fault Classification and Localization in Power Transmission Networks Using VAE-Generated Synthetic Data and Machine Learning Algorithms. Machines. 2023;11: 963.
  58. 58. Seydi ST, Kanani-Sadat Y, Hasanlou M, Sahraei R, Chanussot J, Amani M. Comparison of Machine Learning Algorithms for Flood Susceptibility Mapping. Remote Sens (Basel). 2022;15: 192.
  59. 59. Varentsov M, Krinitskiy M, Stepanenko V. Machine Learning for Simulation of Urban Heat Island Dynamics Based on Large-Scale Meteorological Conditions. Climate. 2023;11: 200.
  60. 60. HAMIDA S, GANNOUR O EL, CHERRADI B, OUAJJI H, RAIHANI A. Optimization of Machine Learning Algorithms Hyper-Parameters for Improving the Prediction of Patients Infected with COVID-19. 2020 IEEE 2nd International Conference on Electronics, Control, Optimization and Computer Science (ICECOCS). IEEE; 2020. pp. 1–6.
  61. 61. Omotehinwa TO, Oyewola DO. Hyperparameter Optimization of Ensemble Models for Spam Email Detection. Applied Sciences. 2023;13: 1971.
  62. 62. Alahmadi R, Almujibah H, Alotaibi S, Elshekh AliEA, Alsharif M, Bakri M. Explainable Boosting Machine: A Contemporary Glass-Box Model to Analyze Work Zone-Related Road Traffic Crashes. Safety. 2023;9: 83.
  63. 63. Abdul Samad SR, Balasubaramanian S, Al-Kaabi AS, Sharma B, Chowdhury S, Mehbodniya A, et al. Analysis of the Performance Impact of Fine-Tuned Machine Learning Model for Phishing URL Detection. Electronics (Basel). 2023;12: 1642.
  64. 64. Trouillet J-L, Combes A, Vaissier E, Luyt C-E, Ouattara A, Pavie A, et al. Prolonged mechanical ventilation after cardiac surgery: Outcome and predictors. J Thorac Cardiovasc Surg. 2009;138: 948–953. pmid:19660336
  65. 65. Al-Sarraf N, Thalib L, Hughes A, Houlihan M, Tolan M, Young V, et al. Cross-clamp time is an independent predictor of mortality and morbidity in low- and high-risk cardiac patients. International Journal of Surgery. 2011;9: 104–109. pmid:20965288
  66. 66. Cohen IL, Lambrinos J. Investigating the Impact of Age on Outcome of Mechanical Ventilation Using a Population of 41,848 Patients From a Statewide Database. Chest. 1995;107: 1673–1680. pmid:7781366
  67. 67. Anzueto A, Frutos-Vivar F, Esteban A, Bensalami N, Marks D, Raymondos K, et al. Influence of body mass index on outcome of the mechanically ventilated patients. Thorax. 2011;66: 66–73. pmid:20980246