An improved dataset for predicting mammal infecting viruses from genetic sequence information
Fig 6
The performance of common machine learning models was evaluated on the corrected and rebalanced datasets for human, primate, and mammal host targets.
The ROC AUC for each model is reported for the predictions on test across 10 seeds. The violin plot displays the statistical distributions for ROC AUC across estimators, with dashed lines for the three quartiles–the 25th percentile, the median, and the 75th percentile.