The ability to classify patients based on gene-expression data varies by algorithm and performance metric

doi:10.1371/journal.pcbi.1009926

The ability to classify patients based on gene-expression data varies by algorithm and performance metric

Fig 3

Tradeoff between execution time and predictive performance for classification algorithms.

When using gene-expression predictors only (Analysis 1), we calculated the median area under the receiver operating characteristic curve (AUROC) across 50 iterations of Monte Carlo cross validation for each combination of dataset, class variable, and classification algorithm. Simultaneously, we measured the median execution time (in seconds) for each algorithm across these scenarios. sklearn/logistic_regression attained the top predictive performance and was the 4th fastest algorithm (median = 5.3 seconds). The coordinates for the y-axis have been transformed to a log-10 scale. We used arbitrary AUROC thresholds to categorize the algorithms based on low, moderate, and high predictive ability.

doi: https://doi.org/10.1371/journal.pcbi.1009926.g003