Table 1.
Dataset characteristics.
Figure 1.
Representation of the processing flow for automatic disease activity labeling.
Abbreviations: CUI – Unified Medical Language System Concept Unique Identifier; cTAKES – clinical Text Analysis and Knowledge Extraction System; LR – Low/Remission disease activity; MH – Medium/High disease activity; EMR – Electronic Medical Record.
Figure 2.
Lab-value and 20 top-ranked CUIs.
Their Chi-square values were visualized as bars. Longer bars suggest higher impact. The negative signs “-” before some of the CUIs suggest negation (CUI – Unified Medical Language System Concept Unique Identifier).
Figure 3.
Histogram of DAS28 scores for 25 discordant cases.
These discordant cases are between DAS labels and domain expert labels among 93 random samples from the Training Set (the remaining 68 cases were concordant).
Figure 4.
Error analysis of the best performing classifier.
Out of 429 misclassified cases (using DAS28 derived dichotomous labels as gold standard), the majority are from the Moderate and Low disease activity categories.
Table 2.
Corpus selection effect on Test set 1 using a linear-kernel SVM model.
Table 3.
Feature contribution.
Table 4.
Portability testing.
Figure 5.
Scatter plot of DAS28 scores and log transformed lab values.
(Left) Scatter plot of DAS28 scores and log transformed lab values for 1320 correctly classified notes. (Right) Scatter plot of DAS28 scores and log transformed lab values for 429 misclassified notes. The lines are the regression lines.
Figure 6.
(Left) Range of lab values for Moderate/High (MH) disease activity cases vs. Range of lab values for Low/Remission (LR) disease activity cases among 1320 correctly classified notes. (Right) Range of lab values for Moderate/High (MH) disease activity cases vs. Range of lab values for Low/Remission (LR) disease activity cases among 429 misclassified notes.