Practice-Based Evidence: Profiling the Safety of Cilostazol by Text-Mining of Clinical Notes

Background Peripheral arterial disease (PAD) is a growing problem with few available therapies. Cilostazol is the only FDA-approved medication with a class I indication for intermittent claudication, but carries a black box warning due to concerns for increased cardiovascular mortality. To assess the validity of this black box warning, we employed a novel text-analytics pipeline to quantify the adverse events associated with Cilostazol use in a clinical setting, including patients with congestive heart failure (CHF). Methods and Results We analyzed the electronic medical records of 1.8 million subjects from the Stanford clinical data warehouse spanning 18 years using a novel text-mining/statistical analytics pipeline. We identified 232 PAD patients taking Cilostazol and created a control group of 1,160 PAD patients not taking this drug using 1∶5 propensity-score matching. Over a mean follow up of 4.2 years, we observed no association between Cilostazol use and any major adverse cardiovascular event including stroke (OR = 1.13, CI [0.82, 1.55]), myocardial infarction (OR = 1.00, CI [0.71, 1.39]), or death (OR = 0.86, CI [0.63, 1.18]). Cilostazol was not associated with an increase in any arrhythmic complication. We also identified a subset of CHF patients who were prescribed Cilostazol despite its black box warning, and found that it did not increase mortality in this high-risk group of patients. Conclusions This proof of principle study shows the potential of text-analytics to mine clinical data warehouses to uncover ‘natural experiments’ such as the use of Cilostazol in CHF patients. We envision this method will have broad applications for examining difficult to test clinical hypotheses and to aid in post-marketing drug safety surveillance. Moreover, our observations argue for a prospective study to examine the validity of a drug safety warning that may be unnecessarily limiting the use of an efficacious therapy.


Introduction
Peripheral arterial disease (PAD) is a growing problem that now accounts for every fifth dollar spent on inpatient cardiovascular care in the United States [1]. This condition affects approximately 8 million Americans, and is associated with significantly impaired long-term cardiovascular outcomes [2]. For example, PAD patients have been shown to have high rates of mortality, stroke and myocardial infarction (MI), with an equal or even greater risk of events than those subjects with a diagnosis of cerebrovascular or coronary artery disease [3]. Patients with claudication also report reduced quality of life, experience higher rates of clinical depression, and are measurably more sedentary than non-PAD patients [4][5][6].
Despite the impact of this disease, very few medical therapies are available to the patient with PAD. Indeed, Cilostazol is the only FDA-approved medication that carries a class I indication for the treatment of intermittent claudication [7]. Cilostazol is a type III phosphodiesterase inhibitor that possesses both vasodilatory and anti-platelet properties, and has been shown to improve maximal walking distance significantly compared to placebo in a series of prospective randomized clinical trials [8,9]. Cilostazol can induce a number of minor side effects such as headache and diarrhea, but generally has been observed to be safe with regards to major cardiovascular events such as myocardial infarction, stroke and death [10,11]. However, other phosphodiesterase inhibitors such as milrinone have been associated with increased mortality rates in patients with congestive heart failure (CHF) [12], and Cilostazol has therefore been issued a black box warning despite never having been shown to increase risk of any major clinical endpoint [12,13]. Prior attempts to quantify this risk were underpowered and did not lead to reversal of the FDA's risk assessment [14]. To additionally quantify the risk associated with this black box warning, we developed a novel text-analytics pipeline to examine the adverse event profile [15] of Cilostazol in a clinical setting, and also in patients with CHF.

Data Sources
We used clinical notes from the Stanford Translational Research Integrated Database Environment (STRIDE). For validation of our findings in the CHF subgroup we used data from the Palo Alto Medical Foundation (PAMF).
The STRIDE dataset spans 18-years' worth of data from 1.8 million patients; it contains 19 million encounters, 35 million coded ICD9 diagnoses, and a combination of pathology, radiology, and transcription reports totaling over 11 million unstructured clinical notes.
The PAMF dataset spans 13-years' worth of patient data from 1.2 million patients; it contains 78 million encounters, 64 million coded ICD9 diagnoses, and a combination of progress notes, pathology, radiology, and transcription reports totaling over 50 million unstructured clinical notes.
The use of these data sources has been approved by the Institutional Review Boards at Stanford and PAMF.

Data Collection and Processing
We processed the unstructured clinical notes as described in Figure 1 and by LePendu et al. [16]. In brief, we used an optimized version of the NCBO Annotator [17,18] with a set of 22 clinically relevant ontologies. We removed ambiguous terms using a variety of statistical and manual filters [19][20][21][22], and flagged negated terms as well as terms attributed to family history [23,24]. We normalized all drugs to their ingredients using RxNorm, such that the terms ''pletal'' and ''cilostazol'' are both normalized to the ingredient Cilostazol. We normalized remaining terms to clinical concepts and aggregated the concepts according to hierarchical relationships, e.g., patients with acute myocardial infarction are also counted as persons with myocardial infarction. Finally, we ordered the set of all concepts for each note based on the time at which the note was recorded. As a result, for every patient, we have sets of concepts spaced apart in time based on the clinical notes they were mentioned in, comprising the patient-feature matrix (see Figure 1).
We recognize drug exposure and clinical conditions based on the temporally ordered concept mentions. We validated the accuracy using a manually annotated gold standard corpus (from the 2008 i2b2 Obesity Challenge [25]). This corpus is manually annotated by two annotators for 16 conditions and was designed to evaluate the ability of NLP systems to identify a condition present for a patient based on textual notes. On average, we achieved 98% specificity for recognizing disease conditions with a precision of 90%. In particular, for PAD we have 98% specificity (with 83% precision) and for CHF 95% specificity (with 92% precision). We trade sensitivity for ensuring high specificity and precision; and sensitivity is around 73%. However, given the large dataset we begin with, we are still able to identify large enough cohorts for the study. Drug recognition is done in a similar manner using strings from RxNORM and an independent study at the University of Pittsburgh, which examined the annotations on 1960 clinical notes manually [26], estimated over 84% sensitivity and 84% specificity for recognizing drugs.

Study Covariates and Outcome Variables
We defined several covariates for propensity score matching and several outcome variables for comparison. Each variable is composed of a set of concepts, and each concept contains several terms. For example, the variable ''myocardial infarction'' is composed of 18 different concepts, including C0027051 (myocardial infarction), C0340324 (silent myocardial infarction) and C0155626 (acute myocardial infarction), etc. (see Material S1). Each of these concepts can be further decomposed into the terms, which are actually mentioned in the clinical notes. For example, the terms ''heart attack'' and ''myocardial infarction'' both count as mentions of the concept C0027051 (myocardial infarction). The list of concepts and terms defining the covariates as well as the outcome variables used in this study was manually curated and can be found in the Material S1.
We defined an index time point of treatment for all patients, and grouped all annotations into two groups: concepts associated with clinical events that happened before treatment (which can therefore be used for matching patients) and concepts associated with events that happened after the treatment (and can therefore be interpreted as outcomes). We scanned the annotations of each patient for the occurrence of concepts before and after the index time point to create a binary matrix; where for each patient we set the variable to 1 or 0 indicating that the concepts had been mentioned in the clinical notes or not. We extracted the demographic variables age, gender and race, and used a crossreference of the STRIDE data with the social security index (SSDI) to define the outcome variable ''death (SSDI)''.

Study Period and Study Groups
We extracted data from our annotations for all patients with PAD, as defined by mention of the peripheral artery disease terms listed in Table 1. To allow a detailed analysis of multiple clinical endpoints, we excluded patients having less than one year's worth of data after their first PAD mention to ensure sufficient clinical follow up data for each patient. For the Cilostazol study group, we selected those PAD patients who had a Cilostazol mention after or at the same time as their first PAD mention. We then used 1:5 propensity score matching to define a control group.
To summarize, patients in the Cilostazol study group met the following criteria: (i) they had to have a diagnosis of PAD as defined by mention of the PAD-related terms listed above, (ii) the first PAD mention had to be before or at the same time as the Cilostazol mention, (iii) the patients were required to have at least one year worth of data after their first PAD mention. The control group similarly carried a diagnosis of PAD, had no mention of Cilostazol, and was matched to the Cilostazol group by propensity score matching based on expert selected variables.

Congestive Heart Failure Study Subgroup
In addition to the total PAD group, we also extracted patients who had a mention of CHF in their clinical notes before the first mention of Cilostazol. The electronic records of these subjects containing the CHF annotation were manually reviewed to confirm the clinical diagnosis of CHF and to ensure the correctness of the temporal ordering. We then used 1:5 propensity score matching to construct a control group from all other PAD patients who also had a history of CHF, but no Cilostazol prescription.

Propensity Score Matching and Statistical Methods
We used propensity score matching to construct control groups. For this purpose, we first fit a propensity score model using logistic regression where the treatment assignment (Cilostazol vs. no Cilostazol) was regressed on the 18 covariates marked in Table 2, including the demographic variables, age at first PAD mention, gender and race, as well as several co-morbidities and coprescriptions. We then used the Matching package for R [27] to perform 1:5 propensity score matching without replacement and to check balance in the variables between the Cilostazol and control groups. We analyzed the success of the matching-whether covariate values were balanced across the two groups after matching-by examining for significant differences in means for continuous variables and significant differences in percentages for indicator variables using a p-value significance level of 0.05. To account for the matched nature of the data, we then used conditional logistic regression [28] of the Survival package for R [29] to compute odds ratios and 95% confidence intervals for several outcome variables. The same analysis was performed for the patients with a history of CHF. Furthermore, we performed standard multivariate logistic regression to compute odds ratios which: 1) compare the Cilostazol group with all other unmatched PAD patients, 2) adjust for confounding by including several covariates, as well as the propensity scores themselves in the regression model (see Material S2).

Results
In the current paper, we describe a study performed using freetext clinical notes from the clinical data warehouse at Stanford. Our text-processing pipeline converts clinical notes from a patient's medical record into a patient-feature matrix for data mining as described in the Methods. In order to study the outcomes in patients with PAD taking Cilostazol, we examined for differences in several clinical outcomes comparing patients taking Cilostazol with a matched control group. As described in the methods, we defined an index time point (the time point at which treatment for PAD started) and scanned the patient's annotations for occurrence of the variables before and after that time point. We then used variables mentioned before the index time point for propensity score matching and variables mentioned after that time point as outcome variables. We analyzed outcomes between the 232 patients on Cilostazol in STRIDE, and their matched controls by comparing for significant differences in major adverse cardiovascular events (MACE), major adverse limb events (MALE), and symptoms for arrhythmias. We also examined a small cohort of patients with congestive heart failure who were prescribed Cilostazol and validated our findings for the CHF subgroup in an independent dataset.

Propensity Score Matching
In total, there were 11,435 PAD patients in STRIDE. Amongst the entire cohort, there was no difference in mortality (OR = 1.08 CI [0.86, 1.35]) comparing 340 Cilostazol patients with the other 11,095 PAD patients, as assessed by query of the SSDI. In order to carry out a more detailed analysis of multiple clinical endpoints such as MACE and MALE, we restricted our study set of 11,435    Table 2 summarizes the prevalence of several clinical variables in the Cilostazol study group and the unmatched PAD control patients. On average, the Cilostazol patients are older, are more likely male, have more comorbidities, are prescribed more medications and have had more major adverse limb events than PAD patients not taking Cilostazol (p-value ,0.05 for each condition); hence on average Cilostazol patients are sicker than the other PAD patients. After using propensity score matching, we were able to identify a cohort of 1160 controls (1:5 matching) that were fully balanced for all 18 clinical variables (see Table 2). This group was used to compare all subsequent clinical outcomes. In total, 5,892 patient-years of data were available for the subjects studied compared to 2136 patient-years in [14].

Outcomes in PAD Patients Taking Cilostazol
Differences in claudication symptoms. We first quantified the frequency with which subjects in each group reported improvement or resolution of claudication symptoms over time. We were able to 're-discover' that Cilostazol use was associated with a significant reduction in symptomatology [30]-defined by mentions of phrases such as ''no claudication'', ''no complaints of claudication'', or ''no sign of claudication'' after assignment to the Cilostazol group (OR = 2.35, CI [1.75, 3.14]-thus providing a positive control for our approach.
Another example that such text-mining approaches don't always result in negative findings, is given by our recently published study, in which we used similar techniques to detect adverse drug reactions from the clinical notes and achieved 80.4% AUC on a gold standard of positive and negative drug-adverse event associations as well as detected 6 out of 9 recalls in the past decade including the association between Vioxx and Myocardial infarction [15,16].
Differences in major adverse cardiovascular events (MACE). To assess the impact of Cilostazol therapy on major clinical outcomes, we then computed odds ratios for several major adverse cardiovascular events (MACE), including myocardial infarction, stroke, cardiac arrest, sudden cardiac death and defibrillation events. Compared to the entire unmatched PAD cohort, those prescribed Cilostazol had slightly higher rates of MACE (crude OR = 1. 37 Figure 2A). Similar results were obtained adjusting the crude odds ratios for different potential confounders (see Material S2).
Differences in major adverse Limb events (MALE). To assess the impact of Cilostazol therapy on PAD-specific outcomes, we next compared major adverse limb events (MALE) such as amputation and lower extremity revascularization. As expected, the Cilostazol group had much more advanced PAD than the unmatched control PAD group, with significantly higher rates of MALE (crude OR = 6.26, CI [4.30, 9.13]) and each PAD-specific endpoint (see Material S2). Compared to the matched control group, the difference in odds ratios between the groups reduced, but still remained significantly different for MALE ( Figure 2B). Again, similar results were obtained using different ways to adjust for confounders (see Material S2).

Differences
in arrhythmias and arrhythmic symptoms. Despite the concern that Cilostazol may increase malignant arrhythmias, we did not observe any statistically significant differences between the Cilostazol and control PAD patients (either before or after matching) with respect to cardiac arrhythmias, nor typical arrhythmia symptoms (see Figure 2C) and Material S2).

Outcomes in PAD Patients with CHF Taking Cilostazol
We identified several patients who had an annotation of CHF before the first mention of Cilostazol. After manually reviewing their medical records, we confirmed that 43 patients with a diagnosis of CHF were subsequently prescribed Cilostazol for PAD. We used these patients to comprise a CHF study subgroup. Again, we observed an imbalance in several variables including gender, several co-prescriptions and history of revascularization events. Using propensity score matching, we extracted a control group of 215 PAD patients who also had a history of CHF but were not prescribed Cilostazol, and then compared both groups with respect to different outcomes. Matching removed pre-existing imbalance in the covariates (see Material S3). Importantly, Cilostazol use was not associated with an increase in any major adverse cardiovascular event amongst heart failure patients. Similarly, no increase in arrhythmia, arrhythmic symptoms, or sudden cardiac death was observed in this subgroup analysis (see Figure 3). We again observed slightly increased odds ratios for major adverse limb events, in particular revascularization events, confirming that the PAD of the Cilostazol patients was more advanced.
We also extracted data for 96 PAD patients with a history of CHF who were prescribed Cilostazol from an independent data source at PAMF. We manually validated the CHF subgroup similarly as done for the STRIDE dataset. Using propensity score matching we constructed a fully balanced matched control group of 480 patients (for balance analysis see Material S4), and analyzed differences in clinical outcomes between the two groups using the same methods as for STRIDE data. We observed the same trend as seen for the STRIDE data in Figure 3 (see Table 3).

Discussion
In this study, we employed a novel analytical approach to conduct the equivalent of a phase IV safety surveillance study on an efficacious, yet potentially dangerous FDA-approved drug. By querying the clinical medical records of over 1.8 million patients with our pipeline, we were able to identify a large cohort of PAD subjects that were matched with the exception of exposure to Cilostazol, the agent of interest in this study. Using this approach, we did not observe any difference in mortality comparing the Cilostazol patients to all other unmatched PAD patients. We furthermore observed no association between Cilostazol and any major adverse cardiovascular event including stroke, myocardial infarction or death in a reduced fully matched study set, which is in good agreement with earlier studies [31]. We also identified a subset of CHF patients who were prescribed Cilostazol, and interestingly found that it did not appear to increase mortality in this theoretically high-risk group of patients. This proof of principle study shows the potential of data-mining methods to query unstructured data in clinical data warehouses to answer important, but difficult to address clinical questions [32]. Moreover, it argues for a prospective study to examine the validity of an unproven FDA-issued black box warning that likely limits the broad application of a clinically effective therapy.
In many situations, clinical hypotheses often go untested due to ethical concerns around presumed benefit. Examples include the use of PVC-suppressing antiarrythmics post MI or hormone replacement in menopausal women, each of which was found to promote, not prevent risk when formally tested [33,34]. Similarly, clinical trials often do not study the most complicated patients due to concerns over the impact of comorbidities, and clinicians often have little data to guide therapy for the sickest patients. We argue that in the era of electronic medical records, it is possible to harness the knowledge embedded in clinical data warehouses to inform therapy decisions [32] as well as perform phase IV surveillance [15,16,35]. The informatics approaches employed in the current study allow for uncovering 'natural experiments' that would otherwise be difficult to perform-generating practice-based evidence.
By looking at large enough sample sets, it is possible to identify patients of interest who have been exposed to a given treatment approach, compare them to patients who are otherwise indistinguishable, and observe their clinical outcomes for significant differences. Because this work is performed with data from a 'real world' clinical setting, patients who would have been excluded from most clinical trials are also examined, such as the patients with recognized CHF who were prescribed Cilostazol. Given Cilostazol's black box warning, it is difficult to imagine a scenario where these patients would have been enrolled into a trial that was supported by a pharmaceutical company and endorsed by an academic Institutional Review Board. While our findings do not prove that Cilostazol is safe in heart failure patients, they help make the case for a prospective study in this cohort.
Because the full medical record can be queried, this approach also offers the benefit of allowing a wide spectrum of endpoints to be assessed. Also, at-risk and other understudied subgroups such as children, the elderly, minorities, pregnant women and those with multiple comorbidities could be studied with this approach. In the current study, we focused heavily on potential arrhythmic complications given the high incidence of palpitations reported in the original Cilostazol studies. Importantly, no increase in arrhythmia was observed and there was no increase in total mortality or sudden cardiac death -endpoints, which would have  This study has several potential limitations that warrant discussion. Although our annotation pipeline has been shown to have a specificity of 98% for recognizing diseases, we could have missed comorbidities due to false negatives from lower sensitivity (73%). However, these errors should be equally distributed across case and control groups. We performed standard propensity score matching in order to reduce potential bias introduced by imbalance in the covariates; however matching may not have been complete. For example, we did not have access to the subjects' ankle-brachial indices, and therefore could not quantitate the severity of each patient's peripheral stenosis at baseline. Indeed, we observed that the Cilostazol group had higher rates of MALE than control subjects. While we cannot exclude the possibility that Cilostazol promotes the progression of PAD, we view this as an unlikely possibility given the multiple published randomized, placebo-controlled trials demonstrating efficacy of Cilostazol [10,11]. Rather, we suspect that the groups were not completely matched for PAD severity at baseline, given that Cilostazol is generally prescribed to subjects with lifestyle-limiting claudication [3,[36][37][38]. As a result, the Cilostazol group may have had higher-grade ischemic lesions, which necessitated the observed increase in peripheral interventions and MALE. However, if an unmeasured residual imbalance was present, it would bolster the interpretation that Cilostazol is likely safe from a cardiovascular mortality perspective, in that the treatment group presumably had more advanced atherosclerosis, yet had no increase in arrhythmia or cardiovascular events when taking the drug. Moreover, we applied different models including a variety of additional potential confounders and the results did not change (for details see Material S2). Finally, the outcome measures may not have captured events occurring outside of the hospital or that led to hospitalizations in other institutions. However, we note that the endpoint of death was captured for all patients via cross-referencing with the Social Security Death Index data, giving confidence in our conclusions about survival. Also, our 're-discovery' that Cilostazol reduces claudication complaints provides a 'positive control' to illustrate the potential of our approach for detecting subjective clinical endpoints.
In conclusion, we used an informatics approach to examine the side-effect profile of Cilostazol and to indirectly assess the validity of a black box warning that was originally issued over theoretical concerns. We find that the feared complications of malignant arrhythmia and sudden death were not observed in association with the drug in the cohort examined. We used our analytics approach to discover and examine a 'natural experiment' in a subset of patients that would be difficult to enroll in a clinical trial and found that Cilostazol had no untoward effect on survival amongst heart failure patients. This result supports the argument for a prospective randomized trial in CHF patients, which need not be considered unsafe or unethical.
We believe that similar Phase IV monitoring could be executed for other drugs without a proven safety record to identify sequelae not recognized at the time of FDA review. We expect that such data-mining driven surveillance approaches will have broad applicability to the field of pharmaceutical safety and will become a key aspect of Phase IV post-marketing surveillance, particularly for patient groups not likely to be studied in randomized clinical trials.

Supporting Information
Material S1 Variable definitions: The concepts and terms defining the variables used in this study. The table also includes the frequency of each concept/term in the PAD cohort. (XLSX) Material S2 Outcomes analysis using multivariate logistic regression -STRIDE dataset.

(PDF)
Material S3 Balance in variables before and after propensity score matching -CHF subgroup in STRIDE.

(PDF)
Material S4 Balance in variables before and after propensity score matching -CHF subgroup in PAMF. (PDF)