Table 1.
a brief explanation of the technical terms.
Table 2.
Information extracted from the included studies.
Fig 1.
Flow diagram showing the study selection process.
Fig 2.
The number of medical anomaly detection articles based on EHR data by data accessibility.
Dark grey = open, light grey = protected. Open = studies based on publicly accessible or accessible with request data (stated in the papers), protected = studies based on publicly inaccessible data (at the time of data extraction).
Fig 3.
Data preprocessing actions described by the included studies.
The figure presents the consecutive preprocessing actions from left to right: data split in training and test set (to be able to evaluate the performance of the algorithm on previously unseen data), detection of biologically implausible values (to be able to remove these from the dataset), scaling/normalization (to prevent bias due to the algorithm giving to much weight to variables with intrinsically larger numeric values), missing value handling (see methods and results), and variable selection (to use only those variables for predictions that have significant influence). ABC, Artificial bee colony; AUC, Area under the curve; CV, Cross validation, DBSCAN, Density-Based Spatial Clustering of Applications with Noise; ES, Evolutionary search; FSSMC, Feature selection via supervised model construction; HGPs, Hierarchical Gaussian processes, KDFS, Knowledge and data combined feature selection; k-NN, K-nearest neighbors, LOF, Local outlier factor; MICE, Multiple imputation by chained equations; missForest, a random forest–based imputation algorithm; ML, Machine learning; mRMR Minimum redundancy maximum relevance; MR-PB-PFS, Map reduce-based machine learning algorithms; OC-SVM, One class support vector machine; OOB PPI, Out-of-bag permuted predictor importance; PSO, Particle swarm optimization; REF, recursive feature elimination; RF, Random forest.
Fig 4.
Performance metric co-occurrence among medical anomaly detection studies.
The figure presents the co-occurrence of metrics when more than one performance metric was reported (each specific combination in a separate row, the respective metrics are shown in the column heads, n shows the number of studies with that specific combination). BS, Brier Score; FNR, False Negative Rate; FPR, False Positive Rate; MCC, Matthews Correlation Coefficient; NPV, Negative Predictive Value; PR AUC, Area Under the Precision Recall Curve; ROC AUC, area under the receiver operating characteristic curve; YI, Youden’s index.
Fig 5.
Frequency of used supervised algorithms (left) and unsupervised algorithm categories (right) in the included articles.
ANN, artificial neural network; BBHA, binary black hole algorithm; DA, discriminant analysis; DT, decision tree; LR, logistic regression; NB, naïve Bayes; SVM, support vector machine.
Table 3.
A checklist of characteristics that are useful to report in publications on medical anomaly detection studies.