Table 1.
Variables used as inputs (i.e., features) into the machine learning models.
Table 2.
Demographic breakdown of presented and published abstracts across the NASS AGM 2013–2015 for the entire dataset.
Fig 1.
Network plot representing the correlation of features in the training set.
Colour represents direction according to the scale on the right-hand side. Line thickness and proximity of features represent the strength of correlation. *Represent categorical features.
Fig 2.
Receiver operator curve (ROC) plot for the models during training and testing.
Models with larger areas under the ROC represent better models.
Fig 3.
Confusion matrices of various algorithms applied to the testing data.
Matrices to be interpreted like a 2-by-2 epidemiologic table, with true positives and true negatives on the top left and bottom right corners and false positives and false negatives in the top right and bottom left corners.
Table 3.
The mean of the resampled accuracy, area under the receiver operator curve (AUC), sensitivity, specificity, positive predictive value (PPN) and negative predictive value (NPV) during model training, cross validation, and testing.
Fig 4.
Bar plot representing the top ten most important features used by the random forest model.
Importance is represented by percentage (%) normalized with respect to the most important feature.