Using genomic data and machine learning to predict antibiotic resistance: A tutorial paper

doi:10.1371/journal.pcbi.1012579

Table 1.

Example of antibiotic phenotypic data from the Moradigaravand dataset.

More »

Expand

Table 2.

Example of gene presence-absence table.

More »

Expand

Fig 1.

Logistic Regression.

A sigmoid function that is representative of a logistic regression model.

More »

Expand

Fig 2.

Random Forest.

Multiple decision trees that are the basis of a random forest model.

More »

Expand

Fig 3.

Extreme Gradient-Boosted Tree.

Decision trees created sequentially in Extreme Gradient-Boosted Tree (XGBoost).

More »

Expand

Fig 4.

Neural Networks composed of different nodes inspired by neurons.

More »

Expand

Fig 5.

Cross Validation.

(A) k-fold cross validation (here shown is 4-fold). (B) Stratified blocked cross validation. Code is provided for both options: regular random k-fold and for stratified blocked cross validation based on MLST. For both options, the initial training and testing data split is stratified based on labels (R, S).

More »

Expand

Fig 6.

Example of how confusion matrix results are used to calculate evaluation metrics.

(A) Determination of true positive (TP), true negative (TN), false positive (FP), and false negative (FN) labels can be seen for toy examples. (B) A confusion matrix based on the toy model from (A) showing total numbers of TP, TN, FP, and FN. (C) Numbers from the confusion matrix are used to calculate various evaluation metrics.

More »

Expand

Table 3.

Python Libraries used in tutorials.

More »

Expand

Fig 7.

Final data visualization result from the last tutorial notebook.

The figure shows various bar graph subplots, with each one showing results for an individual evaluation metric. Only the highest-performing combinations of features for each metric are shown. The color key refers to the machine learning models (pink for logistic regression, green for random forest, yellow for extreme gradient-boosted trees, and blue for neural network). The black and white grid on the very bottom indicates which features were used to result in the highest performing model that predicted each antibiotic: ceftazidime (CTZ), cefotaxime (CTX), ampicillin (AMP), amoxicillin (AMX), amoxicillin-clavulanate (AMC), piperacillin-tazobactam (TZP), cefuroxime (CXM), cephalothin (CET), gentamicin (GEN), tobramycin (TBM), trimethoprim (TMP), ciprofloxacin (CIP).

More »

Expand