Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

RRMSE-enhanced weighted voting regressor for improved ensemble regression

Abstract

Ensemble regression methods are widely used to improve prediction accuracy by combining multiple regression models, especially when dealing with continuous numerical targets. However, most ensemble voting regressors use equal weights for each base model’s predictions, which can limit their effectiveness, particularly when there is no specific domain knowledge to guide the weighting. This uniform weighting approach doesn’t consider that some models may perform better than others on different datasets, leaving room for improvement in optimizing ensemble performance. To overcome this limitation, we propose the RRMSE (Relative Root Mean Square Error) Voting Regressor, a new ensemble regression technique that assigns weights to each base model based on their relative error rates. By using an RRMSE-based weighting function, our method gives more importance to models that demonstrate higher accuracy, thereby enhancing the overall prediction quality. We tested the RRMSE Voting Regressor on six popular regression datasets and compared its performance with several state-of-the-art ensemble regression algorithms. The results show that the RRMSE Voting Regressor consistently achieves lower prediction errors than existing methods across all tested datasets. This improvement highlights the effectiveness of using relative error metrics for weighting in ensemble models. Our approach not only fills a gap in current ensemble regression techniques but also provides a reliable and adaptable method for boosting prediction performance in various machine learning tasks. By leveraging the strengths of individual models through smart weighting, the RRMSE Voting Regressor offers a significant advancement in the field of ensemble learning.

1. Introduction

Ensemble learning is a powerful machine learning paradigm that combines multiple models, often referred to as “base learners,” to solve the same predictive problem. This approach is applicable to both classification and regression tasks, where the goal is to predict categorical labels or continuous numerical values, respectively. The fundamental idea behind ensemble learning is that by aggregating the strengths of diverse models, the ensemble can achieve higher accuracy and greater robustness than any individual base learner alone [13].

Formally, an ensemble  is defined as a collection of predictors aimed at approximating a target function f. Each predictor in the ensemble, denoted as , represents a different hypothesis or model trained on the same dataset. Mathematically, the ensemble can be expressed as:

(1)

Here, k represents the number of base learners in the ensemble. The goal is to combine these base learners in a manner that leverages their individual strengths while mitigating their weaknesses. The final prediction of the ensemble, , is typically obtained by aggregating the outputs of the base learners. This aggregation can be achieved through various methods, such as averaging in regression tasks or majority voting in classification tasks.

The ensemble learning process generally involves three main steps [4,5]. The first step is ensemble generation, which entails selecting and training the base learners to be included in the ensemble. Depending on whether the same algorithm is used to generate all base learners, ensembles can be categorized as homogeneous or heterogeneous. A homogeneous ensemble consists of base learners that are all the same type, such as multiple decision trees trained with different subsets of the data [6]. In contrast, a heterogeneous ensemble includes base learners of different types, such as combining decision trees, neural networks, and support vector machines within a single ensemble [7].

The second step is ensemble pruning, where the ensemble is refined by removing some of the base learners. This step aims to eliminate models that contribute little to the overall performance, thereby reducing computational complexity and avoiding overfitting. Effective pruning strategies can enhance the efficiency and effectiveness of the ensemble by focusing on the most promising models. The final step is ensemble integration, which involves combining the predictions of the remaining base learners to produce the final output. For regression problems, this integration is typically performed using a linear combination of the individual predictions:

(2)

Here, hi ( x )  represents the weighting function assigned to the i -th base learner’s prediction. The choice of weighting functions is crucial, as it determines the influence each base learner has on the final prediction. Ensemble integration strategies can be broadly classified into two categories: constant weighting functions [8], where are fixed constants, and non-constant weighting functions [8], where the weights vary depending on the input features x [9].

Despite the proven advantages of ensemble learning, a significant challenge remains in effectively determining the appropriate weights for combining base learners, especially in scenarios lacking domain-specific knowledge. Typically, ensemble voting regressors employ uniform weights, assigning equal importance to all base models regardless of their individual performance. This uniform weighting can limit the potential improvements in prediction accuracy, as it fails to account for the varying reliability and strengths of different models across diverse datasets [10].

To address this limitation, this study introduces the RRMSE (Relative Root Mean Square Error) Voting Regressor, a novel ensemble regression technique that leverages RRMSE to dynamically assign weights to each base model. By using RRMSE as the basis for weighting, the proposed method prioritizes models that exhibit lower relative errors, thereby enhancing the overall prediction quality of the ensemble. This approach provides a systematic and data-driven mechanism for weight assignment, even in the absence of extensive domain knowledge, thereby bridging the existing gap in weighted ensemble regression methods.

To evaluate the effectiveness of the RRMSE Voting Regressor, comprehensive experiments are conducted on six widely recognized regression datasets. These experiments compared the performance of the proposed method against several state-of-the-art ensemble regression algorithms, including traditional uniform-weighted ensembles and other advanced weighted approaches. The results demonstrate that the RRMSE Voting Regressor consistently achieves lower prediction errors across all tested datasets, highlighting its superior ability to enhance ensemble performance through intelligent weighting. This approach not only improves prediction accuracy but also contributes to the broader field of ensemble learning by offering a reliable and adaptable method for boosting predictive performance in various machine learning applications.

2. Ensemble learning for regression

Ensemble learning is a widely adopted approach in machine learning that aims to improve the performance of predictive models by combining multiple base learners. In regression tasks, ensemble methods are particularly useful for enhancing prediction accuracy and robustness by leveraging the strengths of diverse models. Ensemble methods for regression can be broadly classified into two main categories: averaging methods and boosting methods. This study focuses on averaging methods, specifically on developing a weighting function for heterogeneous ensemble models. Consequently, only the common averaging methods are discussed in detail below.

2.1 Averaging methods

Averaging methods are fundamental ensemble techniques that involve training multiple base learners independently and then combining their predictions by averaging. The primary advantage of averaging is its simplicity and effectiveness in reducing the variance of individual models, leading to more stable and accurate predictions. Averaging methods can be further divided into various approaches, each with its unique way of generating and combining base learners.

2.1.1 Bagging regression.

Bagging, short for Bootstrap Aggregating [11], is one of the most popular averaging methods in ensemble learning. It aims to reduce the variance of base learners, thereby enhancing the overall stability and accuracy of the ensemble. Bagging achieves this by creating multiple subsets of the original training dataset through bootstrapping, which involves random sampling with replacement.

Mathematically, let denote the original training dataset, where xjrepresents the feature vector and yj is the target variable. Bagging generates k bootstrap samples for , each of size n, by sampling with replacement from D. For each bootstrap sample , a base regression model is trained independently.

The ensemble prediction for a new input x is then obtained by averaging the predictions of all base learners:

(3)

By averaging the predictions, bagging effectively reduces the variance associated with individual models, leading to more reliable and generalized predictions [12], as shown in Figure 1. Bagging is particularly effective with high-variance, low-bias models such as decision trees, making it a popular choice in ensemble regression frameworks.

thumbnail
Fig 1. Bagging consists of fitting several base models on different bootstrap samples and building an ensemble model that averages the results of these base learners.

https://doi.org/10.1371/journal.pone.0319515.g001

Here, we extend the traditional bagging approach by developing a heterogeneous bagging algorithm. Unlike homogeneous bagging, where all base learners are of the same type, heterogeneous bagging incorporates diverse types of base models (e.g., decision trees, support vector machines, neural networks) within the ensemble. This diversity can further enhance the ensemble’s performance by capturing a wider range of data patterns and reducing the risk of model-specific biases. The heterogeneous bagging algorithm developed for this work serves as a benchmark for comparing the effectiveness of the proposed RRMSE Voting Regressor.

2.1.2 Voting regression.

Voting regression is another widely used averaging ensemble method. Unlike bagging, which typically uses identical base learners trained on different subsets of data, voting regression combines predictions from different types of regression models trained on the entire dataset. This method leverages the unique strengths of each base model to improve overall prediction accuracy.

In voting regression, each base model makes a prediction for the target variable, and these predictions are then combined, usually by averaging, to produce the final ensemble prediction. The way these predictions are combined is crucial, as it directly influences the accuracy and reliability of the final output.

Figure 2 illustrates the structure of a voting regression model. Each base learner is assigned a specific weight, which determines its influence on the final prediction. Properly assigning these weights is essential for maximizing the ensemble’s performance, as it ensures that more accurate models have a greater impact on the final prediction. Here, we focus on developing a dynamic weighting function for heterogeneous ensembles, allowing for more flexible and accurate combination of diverse base learners. This approach contrasts with traditional voting regressors that typically use uniform weights, which may not fully exploit the potential of each base model’s strengths.

thumbnail
Fig 2. Voting regression involves fitting several base models on the entire dataset and building an ensemble model that averages the results of these base learners, with each base learner assigned its own weight.

https://doi.org/10.1371/journal.pone.0319515.g002

2.2 Weighting functions

The effectiveness of ensemble methods largely depends on how the predictions of base learners are combined. Weighted averaging introduces a mechanism to assign different weights to base learners based on their performance, thereby enhancing the ensemble’s overall predictive power. This section reviews related work on weighting functions used for ensemble integration. Equation 4 shows that ensemble integration is performed using a linear combination of the base learners’ predictions:

(4)

Here, hi ( x )  represents the weighting function assigned to the i -th base learner’s prediction. Weighting functions can be categorized into two types: constant weighting functions, where hi ( x )  are fixed coefficients, and non-constant weighting functions, where the weights vary depending on the input features x.

2.2.1 Constant weighting functions.

Constant weighting functions assign the same set of coefficients to base learners, regardless of the input data. This approach simplifies the integration process but may not fully leverage the individual strengths of each base model [13]. For example, the Basic Ensemble Method (BEM) proposed by [14] uses equal weights for all base learners:

(5)

where

(6)

BEM assumes that the errors mi ( x )  are independent and have zero mean. To address potential issues with multi-collinearity, [14] modified BEM to create the Generalized Ensemble Method (GEM):

(7)

where

(8)

However, GEM faces challenges such as multi-collinearity, which leads to wide confidence intervals for the coefficients due to the necessity of calculating the inverse matrix . To mitigatthis issue, [15] integrated the ensemble integration phase into the ensemble selection phase. By selecting models from a pool to include in the ensemble and using simple averaging as the weighting function, the coefficients are implicitly determined based on the frequency each model is selected. Another approach by [16] utilizes Principal Component Regression (PCR) to address multi-collinearity. PCR determines principal components and selects the number of components to use based on the amount of variance they explain, thereby reducing the complexity and variance of the coefficient estimates.

2.2.2 Non-constant weighting functions.

Non-constant weighting functions assign different weights to base learners based on the input features x, allowing the ensemble to adapt its predictions dynamically. Unlike constant weighting, non-constant weighting can capture more nuanced relationships between the input data and model performance [17].

Static weighting functions use predefined coefficients based on prior domain knowledge or expertise, which do not change with the input data. In contrast, dynamic weighting functions determine the weights hi ( x )  based on the performance of base learners for specific input instances. This allows the ensemble to prioritize models that perform better on similar data points [18].

For instance, [19] and [20] employ k-nearest neighbors with Euclidean distance and weighted k-nearest neighbors to select predictors dynamically. Dynamic weighting [21] assigns weights to each base learner based on their localized performance within a group of k-nearest neighbors. The final prediction is then computed as a weighted average of the predictions from these selected models. Here, we implemented a Dynamic Weighting Voting Regressor (DWR) to compare its performance with the proposed RRMSE Voting Regressor. By dynamically assigning weights based on model performance in specific regions of the input space, DWR serves as a relevant baseline for evaluating the effectiveness of our RRMSE-based weighting approach.

In summary, ensemble learning for regression encompasses a variety of methods aimed at improving prediction accuracy and robustness by combining multiple base learners. Averaging methods, such as bagging and voting regression, offer straightforward approaches to reducing model variance through parallel training and prediction aggregation. Boosting methods, although not the focus of this study, provide alternative strategies for sequentially enhancing model performance by addressing residual errors. Weighted averaging introduces a critical enhancement by allowing differential influence of base learners based on their performance, which is especially beneficial in heterogeneous ensembles where diversity among base models is leveraged for improved predictions.

3. RRMSE voting regressor

Relative Root Mean Squared Error (RRMSE) extends the Root Mean Squared Error (RMSE) by normalizing it with the scale of the actual values. This normalization facilitates a relative comparison of errors across different models and datasets. RMSE is defined as:

(9)

where:

  • is the predicted value for the i -th sample,
  • fi is the actual value for the i -th sample,
  • n is the total number of samples.

To account for the scale of the target variable, RRMSE is defined as:

(10)

RRMSE provides a dimensionless measure of error, enabling fair comparisons between models operating on different scales. A well-performing ensemble comprises accurate predictors with errors distributed across various regions of the input space. By decomposing the generalization error into different components, RRMSE serves as a dynamic weighting function to optimize ensemble integration.

The RRMSE Voting Regressor assigns weights to base learners based on their RRMSE values, ensuring that more accurate models have a greater influence on the final prediction. The algorithm consists of four main phases: initialization, RRMSE calculation, weight assignment, and ensemble prediction, as shown in Algorithm 1.

The algorithm operates through four main phases. In the first phase, a constant ϵ is calculated to prevent division by zero during weight assignment. This constant is derived as the average absolute deviation of the training labels from their mean:

During the second phase, the RRMSE for each base learner is computed using Equation 10. These RRMSE values are stored in an array for subsequent weight calculation. In the third phase, weights Wk are assigned to each base learner inversely proportional to their RRMSE values:

This normalization ensures that the sum of all weights equals one, allowing for a balanced contribution from each base learner based on their relative performance.

Finally, in the fourth phase, ensemble predictions are generated by computing the weighted sum of the base learners’ predictions for each test sample:

This weighted averaging integrates the strengths of individual base learners, enhancing the overall accuracy and reliability of the ensemble. Using RRMSE as a weighting function ensures that the influence of each base learner is proportional to its relative error. By normalizing RMSE, RRMSE accounts for the scale of the target variable, making the weighting mechanism robust across different datasets. The inverse proportionality of weights to RRMSE emphasizes models with lower relative errors, thereby optimizing the ensemble’s predictive performance. Moreover, the normalization step in weight assignment ensures that the sum of all weights equals one, maintaining a balanced contribution from each base learner. This approach mitigates the risk of any single model disproportionately dominating the ensemble, promoting a more harmonious integration of diverse models.

4. Experiments

To evaluate the effectiveness of the RRMSE Voting Regressor, we conducted experiments using six well-known regression datasets, as listed in Table 1. These datasets vary in domain, size, and feature count, providing a comprehensive assessment of the proposed method. Specifically, the Abalone dataset [22] comprises 4,177 instances with 8 physical measurements, aiming to predict the age of Abalone shells. The Car dataset [22] includes 398 instances and 7 features, with the target variable being fuel consumption measured in miles per gallon (mpg). The Diamond dataset contains 53,940 instances and 9 attributes, focusing on predicting diamond prices. The Airfoil dataset [22] consists of 1,503 instances with 5 features, used to predict the scaled sound pressure level based on aerodynamic and acoustic measurements. The Smart Grid Stability dataset [22] includes 60,000 instances and 12 features, relating to the local stability analysis of a 4-node star system implementing decentralized smart grid control. Lastly, the Elongation dataset contains 385 instances with 17 chemical components as input attributes, aimed at predicting the elongation of steel.

thumbnail
Table 1. Datasets used in experiments and their properties.

https://doi.org/10.1371/journal.pone.0319515.t001

To benchmark the performance of the RRMSE Voting Regressor, three state-of-the-art ensemble regression algorithms are also evaluated. The first baseline is the Voting Regressor with uniform weights (VRU), implemented using the Scikit-Learn library [23]. The second baseline is the Bagging Regressor with uniform weights (BR), which extends the traditional bagging approach to a heterogeneous ensemble. The third baseline is the DWR [21], which assigns weights to base learners based on their localized performance. While VRU is readily available in Scikit-Learn, both BR and DWR are implemented by the author in Python to ensure a fair comparison with the proposed method.

Model performance was assessed using multiple evaluation metrics to provide a comprehensive understanding of each algorithm’s predictive capabilities. In addition to RMSE (cf. Equation 9), the following metrics are employed:

(11)

Mean Absolute Error (MAE) measures the average magnitude of errors without considering their direction. It is calculated as the average of the absolute differences between predicted and actual values. Mean Squared Error (MSE) is another widely used loss metric that penalizes larger errors more heavily than smaller ones. It is defined as the average of the squared differences between predicted and actual values.

(12)

The score, or coefficient of determination, represents the proportion of variance in the target variable that is explained by the model. It provides an indication of the model’s goodness of fit, with higher values indicating better explanatory power.

For each ensemble algorithm, the same set of base learners was utilized to ensure consistency in comparison. These base learners include Linear Regression (LR), K-Nearest Neighbors Regression (KNN), Stochastic Gradient Descent Regression (SGD), and Random Forest Regression (RF). All four regression models are implemented using the Scikit-Learn library [23]. The datasets are split into training and testing sets using an 80%-20% ratio, respectively. This split allows sufficient data for training while reserving a representative portion for unbiased evaluation of model performance.

5. Results

The performance of the RRMSE Voting Regressor was evaluated and compared against three other ensemble methods: VRU, BR, and DWR. All base learners were configured with default hyperparameters to ensure a fair comparison. Table 2 presents the results across four regression metrics — MAE, MSE, RMSE, and score — for each of the six testing datasets. The numbers in parentheses indicate the relative ranking of each algorithm for a specific metric and dataset, where a lower rank signifies better performance.

thumbnail
Table 2. Performance of different algorithms across various regression metrics and datasets.

https://doi.org/10.1371/journal.pone.0319515.t002

Table 2 reveals that the RRMSE Voting Regressor outperforms the other three algorithms in five out of six datasets across most metrics. The exception is the Abalone dataset, where RRMSE achieves the second-best average rank. This indicates that while RRMSE generally excels, there are specific instances where other methods like VRU may perform comparably or better.

To determine the statistical significance of these performance differences, we employed the Friedman Aligned Rank Test [24], which is suitable for comparing multiple algorithms across multiple datasets. The test confirmed that there are significant differences among the algorithms at a significance level of . Subsequently, pairwise post-hoc Friedman Aligned Rank Tests were conducted to identify which specific algorithm pairs differed significantly. The results are detailed in Table 3.

thumbnail
Table 3. Pairwise comparisons and significance tests between algorithms for each regression metric.

https://doi.org/10.1371/journal.pone.0319515.t003

Table 3 provides a detailed comparison between each pair of algorithms for every regression metric. The upper diagonal section displays the number of datasets where one algorithm outperforms, underperforms, or ties with another. For instance, under the MAE metric, the RRMSE Voting Regressor wins against both BR and DWR in all six datasets, ties with VRU once, and does not lose in any. The lower diagonal section presents the p-values from the post-hoc Friedman Aligned Rank Tests, indicating whether the performance differences are statistically significant.

The experimental results demonstrate that the RRMSE Voting Regressor consistently outperforms the BR and DWR across multiple datasets and evaluation metrics. Specifically, RRMSE achieves the best average rank in five out of six datasets, indicating its robust ability to minimize prediction errors and explain variance effectively.

In the Abalone dataset, RRMSE ranks second, closely trailing VRU. This suggests that while RRMSE generally excels, there are scenarios where traditional ensemble methods like VRU can perform equally well or better. However, in the Car, Diamond, Airfoil, and Smart Grid Stability datasets, RRMSE maintains its top position, highlighting its versatility and effectiveness across diverse domains. The Elongation dataset further reinforces RRMSE’s superiority, where it achieves the best performance across all metrics, followed by BR, VRU, and DWR. The statistical significance tests confirm that these performance differences are not merely due to chance, particularly the significant improvements over DWR and BR.

The Diamond and Smart Grid Stability datasets, with their large and varied feature sets, highlight RRMSE’s capability to handle complex regression tasks effectively. The high scores across these datasets indicate that RRMSE not only reduces errors but also explains a substantial portion of the variance in the target variables. While VRU shows competitive performance, especially in the Abalone and Car datasets, the lack of statistically significant improvements over RRMSE suggests that RRMSE’s dynamic weighting provides a consistent advantage without being outperformed in specific instances.

Overall, the RRMSE Voting Regressor proves to be a robust and effective ensemble method for regression tasks, offering significant improvements over traditional and dynamic weighting approaches. Its ability to dynamically adjust weights based on relative errors ensures that more accurate models contribute more significantly to the final prediction, thereby enhancing the ensemble’s overall performance and reliability.

6. Conclusions

We introduce a novel weighted voting ensemble algorithm for regression tasks, named the RRMSE Voting Regressor. The algorithm assigns weights to base learners based on the inverse of their RRMSE, ensuring that more accurate models have a greater influence on the final prediction.

The experimental results presented in Section 5 demonstrate that the RRMSE Voting Regressor significantly outperforms the BR and DWR across multiple datasets and evaluation metrics. While the RRMSE Voting Regressor also shows improved performance compared to the VRU, the enhancement is marginal, indicating that VRU remains a competitive baseline in certain scenarios.

These findings confirm that the RRMSE Voting Regressor effectively enhances ensemble regression performance by dynamically weighting base learners according to their predictive accuracy. Moving forward, it would be valuable to further explore and refine the RRMSE Voting Regressor to enhance its predictive capabilities. Potential avenues for improvement include experimenting with a wider variety of base learners and optimizing their hyperparameters using techniques such as Random Search or Bayesian Optimization. Additionally, integrating more sophisticated base learner selection strategies could further elevate the ensemble’s performance.

In conclusion, the RRMSE Voting Regressor represents a promising advancement in ensemble regression methodologies, offering a robust framework for leveraging the strengths of individual models to achieve superior predictive accuracy. Future research aimed at optimizing base learner selection and hyperparameter tuning holds the potential to unlock even greater performance gains.

References

  1. 1. Garcia-Pedrajas N, Hervas-Martinez C, Ortiz-Boyer D. Cooperative Coevolution of Artificial Neural Network Ensembles for Pattern Classification. IEEE Trans Evol Computat. 2005;9(3):271–302.
  2. 2. Roli F, Giacinto G, Vernazza G. Methods for designing multiple classifier systems. International Workshop on Multiple Classifier Systems. Berlin, Heidelberg: Springer Berlin Heidelberg, 2001.
  3. 3. Mendes-Moreira J, Soares C, Jorge AM, Sousa JFD. Ensemble approaches for regression. ACM Comput Surv. 2012;45(1):1–40.
  4. 4. Merz CJ. Classification and regression by combining models. University of California, Irvine, 1998.
  5. 5. Breiman L. Bagging predictors. Mach Learn. 1996;24(2):123–40.
  6. 6. PARMANTO B, MUNRO PW, DOYLE HR. Reducing Variance of Committee Prediction with Resampling Techniques. Connection Science. 1996;8(3–4):405–26.
  7. 7. Perrone MP, Cooper LN. Ensemble methods for hybrid neural networks. Artificial Neural Networks for Speech and Vision, ed. by RJ Mammone (Chapman-Hall, New York 1993); 1992, p. 126–42.
  8. 8. Caruana R, et al. Ensemble selection from libraries of models. Proceedings of the Twenty-First International Conference on Machine Learning. 2004;21:42–9.
  9. 9. Merz CJ, Pazzani MJ. A principal components approach to combining regression estimates. Machine learning. 1999;36:9–32.
  10. 10. Woods K, Kegelmeyer WP, Bowyer K. Combination of multiple classifiers using local accuracy estimates. IEEE Trans Pattern Anal Machine Intell. 1997;19(4):405–10.
  11. 11. Puuronen S, Vagan T, Alexey T. A dynamic integration algorithm for an ensemble of classifiers. Foundations of Intelligent Systems: 11th International Symposium, ISMIS’99 Warsaw, Poland, June 8–11, 1999 Proceedings 11. Springer Berlin Heidelberg, 1999.
  12. 12. Rooney N, et al. Dynamic integration of regression models. Multiple Classifier Systems: 5th International Workshop, MCS 2004, Cagliari, Italy, June 9-11, 2004. Proceedings 5. Springer Berlin Heidelberg, 2004.
  13. 13. Asuncion A, David N. UCI machine learning repository. Nov. 2007.
  14. 14. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research. 2011;12:2825–30.
  15. 15. García S, Fernández A, Luengo J, Herrera F. Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Information Sciences. 2010;180(10):2044–64.
  16. 16. Ganaie MA, Hu M, Malik AK, Tanveer M, Suganthan PN. Ensemble deep learning: A review. Engineering Applications of Artificial Intelligence. 2022;115:105151.
  17. 17. Sagi O, Lior R. Ensemble learning: A survey. Wiley interdisciplinary reviews: data mining and knowledge discovery 2018;8(4):e1249.
  18. 18. Mienye ID, Sun Y. A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects. IEEE Access. 2022;10:99129–49.
  19. 19. Kocev D, Vens C, Struyf J, Džeroski S. Tree ensembles for predicting structured outputs. Pattern Recognition. 2013;46(3):817–33.
  20. 20. Sesmero MP, Ledezma AI, Sanchis A. Generating ensembles of heterogeneous classifiers using Stacked Generalization. WIREs Data Min & Knowl. 2015;5(1):21–34.
  21. 21. Tresp V, Taniguchi M. Combining estimators using non-constant weighting functions. Advances in neural information processing systems. 1994;7
  22. 22. Hänsch R. The power of voting: Ensemble learning in remote sensing. Advances in Machine Learning and Image Analysis for GeoAI. Elsevier; 2024, p. 201–35.
  23. 23. Uzunangelov V, Wong CK, Stuart JM. Accurate cancer phenotype prediction with AKLIMATE, a stacked kernel learner integrating multimodal genomic data and pathway knowledge. PLoS Comput Biol. 2021;17(4):e1008878. pmid:33861732
  24. 24. Ali Y, Hussain F, Haque MM. Advances, challenges, and future research needs in machine learning-based crash prediction models: A systematic review. Accid Anal Prev. 2024;194:107378. pmid:37976634