Fig 1.
The overall processes of the study.
Table 1.
Summary of the variables used in the study.
Fig 2.
The ICD-9 coded baseline.
Fig 3.
The event distribution of stroke subtypes among the four categories.
Fig 4.
The performance curves when adding the variable sets (Table 1).
Table 2.
Performance of different classification algorithms for stroke case identification.
Table 3.
Statistical significance tests (paired T-test) of the performance difference between the machine learning algorithms and the baselines on stroke case identification.
Fig 5.
Precision-recall curves generated by the algorithms.
Table 4.
Performance of different classification algorithms for stroke type identification.
Table 5.
Statistical significance tests (paired T-test) of the performance difference between the machine learning algorithms and the baselines on stroke type identification.
Fig 6.
Confusion matrices generated by ICD9, CLIN, and RF on the test set.
Table 6.
Misclassification errors made by the RF algorithm on the test set.