Prediction of antibiotic resistance in Escherichia coli from large-scale pan-genome data

doi:10.1371/journal.pcbi.1006258

Fig 1.

Prediction performance of the best tuned models.

Accuracy and F1 score (harmonic mean of precision and recall; y-axis) for resistant (top panel) and susceptible (middle panel) phenotypes for four predictive models (red: gradient boosted decision trees; green: logistic regression; teal: random forests; purple: deep learning) across eleven antibiotics (x-axis). The best model of each class for every drug (x-axis) was identified based on the accuracy for predicting resistance and employed a number of possible combinations of gene presence, population structure, and year of isolation (lower panel; black: feature used; white: feature not used).

More »

Expand

Table 1.

Prediction metrics on held out data for the best performing gradient boosted decision trees model.

More »

Expand

Fig 2.

Population structure and phenotypic distribution of the input data.

A) Phylogenetic distribution of clusters identified in the population for SNP distance cut-off values of 2, 143, 5054 and 14489 (outer circles) relative to the phylogenetic tree. B) Phylogenetic distribution of correct calls (true positives, true negatives) and errors (false positives, false negatives) when predicting cephalothin (CIP) resistance with the best performing gradient boosted model. The accuracy for resistance was 0.91. C) Phylogenetic distribution of the most important identified population structure feature, clustering with SNP cut-off of 129 (outer ring), compared with the phylogenetic distribution of resistance phenotype (inner ring; blue: susceptible; light red: intermediate and red: resistant) on the test dataset. Clusters with more than one member are shown.

More »

Expand

Table 2.

Comparison of prediction results with a rule-based models with Resfinder and CARD database of antibiotic resistance genes.

More »

Expand