Using machine learning and big data to explore the drug resistance landscape in HIV

doi:10.1371/journal.pcbi.1008873

Using machine learning and big data to explore the drug resistance landscape in HIV

Fig 2

Discrimination between sequences having at least one RAM, and those having none on sequences with training features corresponding to known RAMs removed.

NB: naive Bayes, LR: Logistic Regression with Lasso regularization, RF: Random Forest, FC: Fisher Classifier. A) Adjusted mutual information (higher is better) for classifiers trained without features corresponding to known RAMs. The classifiers are either trained to discriminate RTI-naive from RTI-experienced sequences (blue), or sequences with at least one known RAM from sequences that have none (orange). Hatching and braced annotations indicate the training and testing sets resulting in a given performance measure. B) Balanced accuracy, i.e. average of accuracies per-class for the same classifiers as in A) (higher is better). The red line at y = 0.5 is the expected value for a classifier only predicting the majority class as well as a random uniform (50/50) classifier.

doi: https://doi.org/10.1371/journal.pcbi.1008873.g002