Galaxy-ML: An accessible, reproducible, and scalable machine learning toolkit for biomedicine

doi:10.1371/journal.pcbi.1009014

Galaxy-ML: An accessible, reproducible, and scalable machine learning toolkit for biomedicine

Fig 2

Pairwise performance comparisons for use cases 1 and 2.

Use case 1 pairwise comparisons for classification tasks on 164 structured biomedical datasets [25] show decision tree forests perform best (panel A) and hyperparameter optimization can improve the performance of most models (panel B). Use case 2 results for prediction using regression (panel C) and classification (panel D) show ensemble approaches that use stacking perform best, though linear-based gradient boosting also performs. In panels A, C, and D, heatmaps show the percentage of datasets for which the model listed along the row outperforms the model along the column. For instance, in panel A, XGBoost outperforms Gradient Tree Boosting (GTB) from scikit-learn on 38% of datasets, GTB outperforms XGBoost on 11% of datasets, and they perform equivalently on 51% of datasets.

doi: https://doi.org/10.1371/journal.pcbi.1009014.g002