Predicting the target specialty of referral notes to estimate per-specialty wait times with machine learning

doi:10.1371/journal.pone.0267964

Fig 1.

A simplified visualization of a patient’s healthcare timeline.

The time between specialist visit and procedure is known because administrative records are accurate. We show our methodology in two steps: 1) Using machine learning to predict the target specialty of referral notes, and 2) Measuring the wait time between referral and consultation visit of pair specialties.

More »

Expand

Table 1.

The number of notes per specialist type.

Notes with fewer than 6 tokens were not counted as they were not informative, per manual checking. The specialty “internal” was also removed as the types of medical consultation done by an internal specialist overlaps with many individual specific specialties such as cardiology, gastroenterology, among others. If present, the names within parentheses represent how the specialty is represented in later figures.

More »

Expand

Table 2.

All possible options for each step of the machine learning pipeline.

All possible combinations of these options were trained for each target specialty (for a total of 80 models per specialty).

More »

Expand

Table 3.

Precision, recall, and F₁Score for each optimization for each specialty found in Table 1.

We do not present evaluate any models for the “aesthesia” class as there were fewer examples than folds. Specialties which are bolded pass our performance threshold.

More »

Expand

Table 4.

Precision, recall, specificity, negative predictive value (NPV), and Brier score loss for the models which we used for prediction.

The standard deviations for each measurement are presented between parentheses.

More »

Expand

Fig 2.

The F₁Score as a function of the number of training examples.

In this figure we present the F₁ score from the model with the highest F₁ score for each specialty (the last column of Table 3) against the number of training (Table 1).

More »

Expand

Fig 3.

The average accuracy (across all specialties) for the highest performing classifiers (when optimized for F₁Score) by different lengths of the referral notes (as measured by number of tokens).

We observe that notes with few tokens have poor accuracy. On the other hand, notes with many tokens are likely have a noisy signal (containing more keywords belonging to other specialties), thus resulting in poor performance.

More »

Expand

Fig 4.

The estimated median and 90^th percentile wait time from family physician referral to specialist visit for nine specialties.

The error bars represent the 95% confidence interval (arrived at by bootstrapping).

More »

Expand

Fig 5.

The estimated median and 75^th percentile wait time from family physician referral to specialist visit for four specialties.

The error bars represent 95% confidence interval (arrived at by bootstrapping). The 2008 numbers are from our previous work and relied on manual coding.

More »

Expand

Fig 6.

The effect of different confidence thresholds (x-axis) on classification metrics (left y-axis) and estimated median wait times (right y-axis) for the 9 selected specialties.

The dashed green line represents the “baseline” wait-time: the wait time if we only estimated using our gold labels (i.e., using patients who only ever had a single referral and specialist consultation in 2015).

More »

Expand