Chapter 13: Mining Electronic Health Records in the Genomics Era
Figure 3
General figure for identifying cases and controls using EHR data.
Application of electronic selection algorithms lead to division of a population of patients into four groups, the largest of which comprises patients who were excluded because they lack sufficient evidence to be either a case or control patient. Definite cases and controls cross some predefined threshold of positive predictive value (e.g., PPV≥95%), and thus do not require manual review. For very rare phenotypes or complicated case definitions, the category of “possible” cases may need to be reviewed manually to increase the sample size.