Using machine learning and big data to explore the drug resistance landscape in HIV
Fig 2
Discrimination between sequences having at least one RAM, and those having none on sequences with training features corresponding to known RAMs removed.
NB: naive Bayes, LR: Logistic Regression with Lasso regularization, RF: Random Forest, FC: Fisher Classifier. A) Adjusted mutual information (higher is better) for classifiers trained without features corresponding to known RAMs. The classifiers are either trained to discriminate RTI-naive from RTI-experienced sequences (blue), or sequences with at least one known RAM from sequences that have none (orange). Hatching and braced annotations indicate the training and testing sets resulting in a given performance measure. B) Balanced accuracy, i.e. average of accuracies per-class for the same classifiers as in A) (higher is better). The red line at y = 0.5 is the expected value for a classifier only predicting the majority class as well as a random uniform (50/50) classifier.