Fig 1.
Data sources for SMH-TB database.
Fig 2.
Patient-level and encounter-level data in SMH-TB.
Table 1.
Variables available in SMH-TB from both structured and unstructured sources.
Fig 3.
Example of a component of a ruleset for extracting a variable (active TB diagnosis) from unstructured text in clinical dictations (using CHARTextract).
Fig 4.
QuickLabel interface for manual variable abstraction.
(A) Value labels are shown for example variables—the Tuberculin Skin Test (TST) and Interferon Gamma Release Assay (IGRA). (B) A screen shot of a representative data extraction using the Quicklabel tool. The corresponding sentences containing the variables of interest are highlighted in yellow.
Table 2.
Derivation of the value labels for diabetes mellitus.
Table 3.
Demographics of the patients included in the SMH-TB database, 2011–2018.
Table 4.
Summary of performance metrics on test set for variables extracted from unstructured dictations. Patients included in test set: N = 200.
Table 5.
Binomial proportion estimate and 95% Confidence Interval (CI) using standard binary regression and MC-SIMEX model for binary variables created from extracted variables.
Total patients with at least 1 dictation: N = 3237.
Table 6.
Association between demographic characteristics and receipt of LTBI treatment.
Total patients who were diagnosed with LTBI, N = 1473.