A natural language processing and deep learning approach to identify child abuse from pediatric electronic medical records

doi:10.1371/journal.pone.0247404

Fig 1.

Schematic of patient record selection and processing.

(a) The records were processed to extract only those notes written before the first note from a Child Abuse Pediatrics team MD or NP, hence allowing for prediction using only information available before the decision to refer a patient to the CAP team. (b) 1123 records for patients evaluated for suspected abuse between 1/1/2015 and 5/1/2019 were identified. Several were excluded for reasons listed in the figure, leaving 867 records for deep learning. (c) Schematic of cross-validation procedure used to create 10 distinct train-test splits.

More »

Expand

Fig 2.

Cross- validation.

Boxplots showing accuracy for n = 10 trials for each of 10 train-test splits, with our chosen model architecture in each strategy. The orange line shows the median, while the edges of the box show the 1st and 3rd quartile. The whiskers extend to 1.5 times the interquartile range, while points greater than 3rd quartile + 1.5*IQR or less than 1st quartile– 1.5*IQR are shown as discrete points.

More »

Expand

Fig 3.

Performance of the best model in each of 10 train-test splits.

A) the average accuracy of ten repetitions, B) the average area under the ROC curve (AUC) of ten repetitions.

More »

Expand

Fig 4.

ROC curves, AUC, accuracy, PPV, sensitivity, specificity, and F1 score for the best performing model in each train-test split for BOW-TFIDF and rules-based approach.

For each model category the receiver operator (ROC) curve, AUC, Accuracy, PPV, Sensitivity, Specificity, and F1 Score for the best model in each train-test split is shown. The ROC curve shows the sensitivity-specificity tradeoff for different classification thresholds, while the tables show the AUC for the ROC curve, as well as accuracy, PPV, sensitivity, specificity, and F1 score at the .5 threshold used in our classification algorithm. (a,c) BOW-TFIDF, (b,d) Rules-Based. The BOW models have highest AUC, with a characteristic ROC plot shape, and high sensitivity, PPV, specificity, and F1 Score.

More »

Expand

Fig 5.

Leave-one-out sensitivity analysis of rules used in rules-based approach.

(a) The change in percentage accuracy that occurs when each rule is in invalidated (set to -1 for each record) for the best performing model from the best train-test split by maximum accuracy, and the best performing model from the worst train-test split by maximum accuracy. For the best split, the invalidation of each rule has no change or lowers the accuracy, with the phrase “history domestic violence” having the greatest impact and reducing accuracy by 0.11 from 0.82 to 0.7. For the worst split, the invalidation of each rule can have no change, or can raise or lower the accuracy. The phrases “history domestic violence” and “rib fracture” have the largest negative impact and reduce accuracy by 0.05 from 0.7 to 0.65, while the phrases “Inconsistent”, “unwitnessed”, “altered mental status”, “employment” and “witnessed” have the largest positive impact and increase accuracy by 0.02 to 0.72. (b) Alphabetical list of rules which do not change accuracy during leave-one-out sensitivity analysis.

More »

Expand