Table 1.
Table of the clinical features of the CT dataset.
For each feature the absolute value and its frequency is shown.
Fig 1.
Workflow for implementing machine learning algorithms in the challenge.
A same 80:20 hod-out validation scheme was used for all initially trained machine learning algorithms. Additionally, each machine learning algorithm was trained on the training sample and validated in 100 10-fold cross-validation rounds. The algorithms thus defined were validated on the independent dataset. Performances were evaluated for both training validation and independent test, in terms of the Area Under the Curve (AUC), Accuracy, Sensitivity, Specificity, Precision and F1 score.
Fig 2.
Workflow of the classifier Ensemble method.
The scores of the various algorithms were averaged and aggregated; they became “features” of a ensemble machine learning model. Final performances for train and test were evaluated and an XAI approach was implemented to explain which feature-algorithm impacted more on the final predictions.
Fig 3.
Pie charts of the adopted software (a), balancing technique (b), adopted classifier (c) and feature selection technique (d) by the various algorithms.
Fig 4.
Heatmaps of the correlation coefficients among the classification score of all the seven algorithms for training (a) and test (b).
Fig 5.
Score distributions for training (a) and test (b) of the various algorithms and the Classifier Ensemble model.
Fig 6.
Comparison of ROC curves and the resulting AUC values.
Blues curve: Ensemble model in Leave-one-out validation scheme over the training set; Red curve: Ensemble model over the test set. The shaded area around each curve indicates the confidence intervals at 95% level.
Fig 7.
Radar plots of the performances of the various algorithms (dashed lines) and the Classifier Ensemble model for training (a) and test (b). The performance metrics were AUC, Accuracy (ACC), Sensitivity (Sens), Specificity (Spe), Precision (Pre) and F1 score.
Fig 8.
Charts of log-loss metrics for the various algorithms for training and test.
Each algorithm has been averaged first and then used for the comparison.
Fig 9.
Bee-swarm of the global model (a) and table of the correspondent strategies (b) adopted by the specific algorithm (outlier mechanism, balancing technique, used classifier, and feature selection algorithm).
Fig 10.
Force-plots of no metastatic sample wrongly classified.