Fig 1.
Overview of the VAMPr workflow.
The VAMPr pipeline processed sequence data from the NCBI Short Read Achieve (SRA) and NCBI BioSample Antibiograms for phenotypes. The curated AMR genotypes and AMR phenotypes were used to create both association and prediction models.
Fig 2.
Summary of significant variant associations and prediction accuracies from 93 species-antibiotics combinations.
Both heatmaps display the counts of curated isolates by the combination of 9 bacterial species and 29 antibiotics from 13 drug categories. The boxes without a number indicates that no isolates were available for this particular bacterial species and antibiotic combination. A) the color of the boxes indicates the number of gene-antibiotic resistance associations with FDR adjusted p-values <0.05 from VAMPr association models, and the actual numbers are shown within the parenthesis; B) the color indicates cross-validated prediction accuracies from VAMPr prediction models, and the accuracies are shown within the parenthesis.
Fig 3.
Examples of variant-phenotype relationships determined by the association models.
(A) K18768.0 indicates blaKPC, the K. pneumoniae carbapenemase. The presence of blaKPC is associated with resistance to ceftazidime in K. pneumoniae as shown. The numbers in the plots represent the frequency of certain MIC (minimal inhibitory concentration) values. Numbers in the plot represent total number of isolates with the given MIC value. (B) K18093.13 is oprD, an imipenem/basic amino acid-specific outer membrane pore; absence of oprD is associated with resistance to imipenem in P. aeruginosa. (C) K18790.0 represents blaOXA-1, the beta-lactamase class D OXA-1. Its presence is associated with resistance to cefepime in E. coli. (D) K19278.0 is aac6-lb gene. The presence of this variant is associated with amikacin resistance in A. baumannii. The “+” and “-”sign in the X-axis represent whether the wild-type gene exists or not. The red horizontal lines mark the mean and standard error of the groupwise MIC measurements. Each gray dot represents an MIC value. P-values are calculated based on Fisher’s exact test. MIC: minimal inhibitory concentration.
Table 1.
Prediction metrics for 32 VAMPr prediction models.
Among 93 prediction models, we listed the top 32 models that have the mean prediction accuracies higher than 95%. The isolate and variant counts derived from sequencing were used to build the prediction model using gradient boosting tree algorithms. The accuracy is reported using nested cross validation approach. The 10-fold outer cross validation were used to report accuracy and the 5-fold inner cross validation was used for hyperparameter tuning.
Fig 4.
Validation performance metrics using an external dataset.
AUROC (Area under the Receiver Operating Characteristic) for the prediction of the external dataset and top three predictors (KEGG orthlog variants based on importance) from the prediction models are reported. A) The AUROC curve for the E. coli and meropenem; B) The AUROC curve for the K. pneumoniae and ceftazidime; C) The AUROC curve for the P. aeruginosa and meropenem; D) The top three predictors for the E. coli and meropenem; E) The top three predictors for the K. pneumoniae and ceftazidime; F) The top three predictors for the P. aeruginosa and meropenem.
Table 2.
External validation of VAMPr prediction model.
The external dataset includes 13 Enterobacter cloacae, 31 Escherichia coli, 24 Klebsiella pneumoniae and 21 Pseudomonas aeruginosa isolates. All isolates were tested against 3 antibiotics (cefepime, ceftazidime and meropenem). We reported the accuracy as the fraction of correct predictions, and the AUROC (area under the receiver operator curve) represents the area under the operator-receiver characteristic. The AUROC value is n/a for E. cloacae as all 13 isolates are susceptible to meropenem.
Fig 5.
VAMPr provides rich sets of online resources for association models and prediction models.
Users have the flexibility to explore known or novel antibiotic resistance-associated variants, and can upload their own sequence assembly and obtain predictions on antibiotic resistance. (A) association results webpage: users can explore variants, their interpretations, and their statistical significance assessments; (B) detailed information, contingency table and odd-ratio for variant K18768 in the association model, and distribution plots; (C) Distribution plots for variant K18768 in the association model page; (D) prediction models allow for uploads of users’ sequence data for antibiotic resistance prediction.