Can We Predict Individual Combined Benefit and Harm of Therapy? Warfarin Therapy for Atrial Fibrillation as a Test Case

Objectives To construct and validate a prediction model for individual combined benefit and harm outcomes (stroke with no major bleeding, major bleeding with no stroke, neither event, or both) in patients with atrial fibrillation (AF) with and without warfarin therapy. Methods Using the Kaiser Permanente Colorado databases, we included patients newly diagnosed with AF between January 1, 2005 and December 31, 2012 for model construction and validation. The primary outcome was a prediction model of composite of stroke or major bleeding using polytomous logistic regression (PLR) modelling. The secondary outcome was a prediction model of all-cause mortality using the Cox regression modelling. Results We included 9074 patients with 4537 and 4537 warfarin users and non-users, respectively. In the derivation cohort (n = 4632), there were 136 strokes (2.94%), 280 major bleedings (6.04%) and 1194 deaths (25.78%) occurred. In the prediction models, warfarin use was not significantly associated with risk of stroke, but increased the risk of major bleeding and decreased the risk of death. Both the PLR and Cox models were robust, internally and externally validated, and with acceptable model performances. Conclusions In this study, we introduce a new methodology for predicting individual combined benefit and harm outcomes associated with warfarin therapy for patients with AF. Should this approach be validated in other patient populations, it has potential advantages over existing risk stratification approaches as a patient-physician aid for shared decision-making


Results
We included 9074 patients with 4537 and 4537 warfarin users and non-users, respectively. In the derivation cohort (n = 4632), there were 136 strokes (2.94%), 280 major bleedings (6.04%) and 1194 deaths (25.78%) occurred. In the prediction models, warfarin use was not significantly associated with risk of stroke, but increased the risk of major bleeding and decreased the risk of death. Both the PLR and Cox models were robust, internally and externally validated, and with acceptable model performances.

Conclusions
In this study, we introduce a new methodology for predicting individual combined benefit and harm outcomes associated with warfarin therapy for patients with AF. Should this Introduction Atrial fibrillation (AF) is a common, age-related, chronic arrhythmia that is a major risk factor for stroke and mortality [1,2]. The presence of AF increases the risk of stroke five-fold independently [3], and doubles the risk of death from AF-related stroke [2]. At present, oral anticoagulants are the mainstay for stroke prophylaxis in patients with AF [4]. Despite the growth in use of newer oral anticoagulants, warfarin remains a frequently used antithrombotic therapy for AF, where it lowers rates of stroke as well as mortality [2,[4][5][6]. However, the use of anticoagulants also is associated with an increased risk of major bleeding including intracranial hemorrhage (ICH) [5]. Thus, this combination of potential life-saving benefit and life-threatening harm may dissuade clinicians from prescribing warfarin for eligible patients [7][8][9][10][11].
Clinical prediction rules such as the CHADS 2 (Congestive heart failure, Hypertension, Age > 75 years, Diabetes, Previous stroke [2 points]) and the CHA 2 DS 2 -VASc (Congestive heart failure; Hypertension; Age ! 75 years [2 points]; Diabetes mellitus; Stroke [2 points], Vascular disease, Age 65-74 years, and Sex category [female]) scores have been developed and widely used to predict stroke risk in AF patients [2,5,12,13]. Likewise, the HAS-BLED score (Hypertension; Abnormal renal/liver function; Stroke history; Bleeding history or predisposition; Labile international normalized ratio [INR], Elderly [>65 years]; Drugs/alcohol concomitantly) has been validated to predict risk of major bleeding with warfarin therapy [2,5,[14][15][16][17]. Unfortunately, the CHADS 2 , CHA 2 DS 2 -VASc and HAS-BLED scores were not derived from the same patients or populations. Specifically, the CHADS 2 used data from 1733 patients in the US National Registry of AF [13], while the CHA 2 DS 2 -VASc and HAS-BLED scores were both developed from the Euro Heart Survey on AF population but used data on 1084 and 3978 patients respectively [12,14]. Thus these scores are unable to assess simultaneously a patient's potential for benefit and/or harm with warfarin therapy, yet this is exactly what each patient wants to know [18].
While the CHADS 2 , CHA 2 DS 2 -VASc and HAS-BLED scores help estimate an individual's chance of benefit and harm separately, a more sophisticated methodology is needed. The 'net benefit' approach involves calculating the main benefit of warfarin therapy (reduced risk of stroke or systemic embolism) then deducting the main harm (weight Ã increased risk of ICH, weight = 1.5) in the same population [19][20][21]. However, this approach does not take into account gastrointestinal (GI) bleeding risk, and the weighting for ICH is chosen arbitrarily.
In general, treatment effects of warfarin therapy for individual patients can be divided into four quadrants: 1) benefit without harm; 2) harm without benefit; 3) neither benefit or harm; and 4) both benefit and harm simultaneously ( Table 1). A method for predicting the probabilities of the four outcome quadrants (i.e., individualized combined benefit and harm outcomes) for each patient is needed. The polytomous logistic regression (PLR) modelling can be used for predictions due to the four multinomial levels of outcomes [22,23]. Therefore, the objective of  this study was to use the PLR modelling to construct and externally validate a prediction model  for patients' individual combined benefit and harm outcomes (stroke with no major bleeding,  major bleeding with no stroke, neither event, or both stroke and major bleeding) with and without warfarin therapy for AF. In real-world clinical settings, the prediction of individualized combined benefit and harm outcomes related to warfarin therapy could assist with the patientphysician shared decision-making process.

Study design and setting
The methods have been described in detail previously [18]. Briefly, Kaiser Permanente Colorado (KPCO), a non-profit, integrated health care delivery system in the U.S. Denver-Boulder metropolitan area, utilizes a centralized anticoagulation service that provides anticoagulation services for KPCO patients with AF [24,25]. KPCO  Patients newly diagnosed with AF between January 1, 2005 and December 31, 2012 were included. Newly diagnosed status was defined by absence of AF diagnosis in the previous 180 days. Patients were followed for up to 180 days after AF diagnosis to assess if warfarin therapy was initiated. Patients who had at least one warfarin purchase or no warfarin purchases were grouped as warfarin users and non-users, respectively. Warfarin non-users were randomly matched 1:1 to warfarin users on year of AF diagnosis [26]. Patients with AF diagnosed between January 1, 2005 and December 31, 2008 comprised the derivation cohort (KPCO-I), while patients with AF diagnosed between January 1, 2009 and December 31, 2012 comprised the validation cohort (KPCO-II). Compared with internal validation by randomly splitting the entire dataset, separating derivation and validation cohorts by AF diagnosis dates enabled a external validation of the model independent of the original data and development process [27]. In addition, the separation by dates of AF diagnoses could also account for changes in standards of care and management for patients over time.

Study patients
The date of AF diagnosis for each patient was defined as study start date. To include as many outcomes as possible, study outcome end date was defined as June 30, 2009 and June 30, 2013 for the derivation and validation cohorts, respectively. To control the potential of immortal time bias, the study index date for warfarin users was defined as the first warfarin purchase date after start date [28,29]. Warfarin non-users were assigned an index date corresponding to the length of time from study start date to the index date of their randomly-matched warfarin user [26]. Warfarin non-users who died prior to their assigned index date were excluded from the analyses, because they were unable to be chosen to enter the cohort [26]. Patients were followed from index date until KPCO plan disenrollment, death, or study outcome end date, whichever came first [18]. assessed from the index date to outcome end date. For the prediction model of stroke or major bleeding, we categorized patients into one of the four outcome groups based on their survival time to first event: stroke with no major bleeding, major bleeding with no stroke, neither event, or both stroke and major bleeding. For the prediction model of all-cause mortality, patients were categorized into survival or non-survival groups.
Stroke and major bleeding events were identified during an ambulatory KPCO medical office visit, emergency department (ED) visit, or inpatient stay using International Classification of Disease, Ninth Revision, Clinical Modification (ICD-9-CM) codes in the primary position. Major bleeding was defined as bleeding that led to a hospital admission or an ED visit requiring a transfusion [30]. However, bleeding that caused a drop in hemoglobin of ! 20g/L but did not necessitate a transfusion [31] was not included as major bleeding since no inpatient or ED hemoglobin laboratory values were available. ICH was categorized as major bleeding, rather than stroke. Stroke or major bleeding occurring before the index date was categorized as a risk factor (i.e., prior stroke, prior major bleeding) rather than a study outcome [18].

Potential predictors of benefit and/or harm
The potential predictors used in this study included patients' demographic characteristics (i.e., sex, age), laboratory measures, baseline comorbidities, warfarin use, and concurrent use of medications that interact with warfarin. Laboratory measurements included INR, hemoglobin, serum creatinine and albumin recorded most proximal but prior to the index date. Comorbidities were from ambulatory KPCO medical office visits in the 180 days prior to the index date. Comorbidities were components of the CHA 2 DS 2-VASc and HAS-BLED schemes, as well as components included in the Charlson Comorbidity Index [32]. Data on warfarin use included the length of time from study start date to the first purchase date, the length of time for each dispensed warfarin prescription from index date, and days of warfarin supplied. Concurrent use of other medications included purchases for medications made during the 90 days after index date. We included concurrent medications for which there was evidence of an interaction that potentiated or inhibited the effect of warfarin. The list of included medications was from two systematic reviews that investigated warfarin interactions with other drugs [33,34].

Statistical analyses
All tests were two-sided with a significance level of 0.05, unless otherwise specified. We described continuous variables as means (+/-standard deviations [SDs]), and frequencies and percentages for categorical variables. Student's t-tests were used to compare continuous variables and chi-square tests of associations were applied for categorical variables. In the derivation and validation cohort, we assessed the stroke and major bleeding incidence rate trends stratified by the CHA 2 DS 2 -VASc score and HAS-BLED score, respectively.

Model building
PLR modeling was used to develop a prediction model for the four individual benefit and harm outcomes using the neither event group as the referent category. Odds ratios (ORs) with 95% confidence intervals (CIs) were used to quantify the relationship between outcomes and predictors. We employed Cox proportional hazards regression analysis to build a prediction model for all-cause mortality, using hazard ratios (HRs) to quantify the associations between predictors and mortality. All of the analyses were adjusted for matching of warfarin users and nonusers.
Both the PLR and Cox regression models followed the same procedures for model construction. First, the effect of multicollinearity was evaluated using the criterion of a variance inflation factor ! 4 to prune candidate predictors. Subsequently, we performed univariate analyses to select all possible predictors with a p-value 0.20 to enter the multivariable analyses. And then the predictors with a p-value < 0.05 in the multivariable analyses were retained in the prediction models. Lastly we identified significant two-way interactions to finalize our prediction models [35].
For the primary outcome, three sensitivity analyses were performed by: 1) using multiple imputations if missing data were ! 10%; 2) treating the use of warfarin as a time-dependent covariate to evaluate the effect of warfarin on stroke and major bleeding, using a gap of > 30 days to indicate warfarin discontinuation [36]; and 3) employing a competing risk analysis using the Fine and Gray method to take into account all-cause mortality as a competing risk of stroke and major bleeding [37].

Model performance and validation
Comparison between the predicted and observed risks in deciles was used to evaluate calibration of the prediction models. Discrimination was measured by the area under the receiver operating characteristic curves (AUCs) for the PLR model and Harrell's C index for the Cox model. Goodness-of-fit was assessed by a Hosmer-Lemeshow statistic [38] and Gronnesby and Borgan test [39] with ten groups based on the predicted risk scores for the PLR and Cox models, respectively.
Two internal validations were performed for the PLR model by using 10-fold cross-validation [40] and bootstrap analysis [41]. We also used bootstrap analysis to internally validate the Cox model for all-cause mortality. For the external validation, because the incidence rates of outcomes were different from the derivation and validation cohorts and there was evidence that the original models were not a good fit to the validation cohort, we updated the models' intercepts as well as the regression coefficients by using the calibration intercepts and calibration slopes [23,42,43]. The evaluation of goodness-of-fit, calibration, and discrimination was repeated in the validation cohort.
Analyses were performed with the software packages SAS Version 9.3 (SAS Institute, Inc., Cary, NC) and STATA Version 12 (Stata Corp., College Station, TX, USA). For the calibration plots of the PLR model, we used the software R version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) with the Design library.
The derivation cohort (KPCO-I) included 4632 patients with a median follow-up of 652 days, while the validation cohort (KPCO-II) included 4442 patients with a median follow-up of 628 days ( Table 2). In the KPCO-I cohort, warfarin users were significantly older and had higher proportions of patients with congestive heart failure, hypertension, renal disease, prior major bleeding, anemia, and alcohol abuse than non-users (all p < 0.05). The CHA 2 DS 2 -VASc (mean 3.09 versus 2.73) and HAS-BLED (mean 1.80 versus 1.54) scores were higher in warfarin users. A higher proportion of warfarin users had purchased concurrently an NSAID, antibiotic, cardiac drug, GI drug, and other drug (tramadol) than non-users. However, a lower percentage of antiplatelet use was observed in warfarin users compared with non-users (p = 0.001). Similar characteristics and comparison between warfarin users and non-users were found in the KPCO-II cohort ( Table 2). S1 Table presents the comparison between warfarin users and nonusers in the whole cohort (i.e., KPCO-I combined with KPCO-II), with similar results to findings as those from the KPCO-I cohort alone. Twenty-eight patients (12 and 16 in the KPCO-I and KPCO-II cohorts, respectively) had a stroke and major bleeding outcome on the same date; thus, their time to first event could not be identified. Because of the low frequency, these patients were randomly allocated into either stroke with no major bleeding (n = 14) or major bleeding with no stroke (n = 14). Therefore, in the combined cohort there were 278 strokes (3.06%), 453 major bleedings (4.99%) and 2186 deaths (24.09%) occurred during follow-up. Of these, 136 strokes (2.94%), 280 major bleedings (6.04%) and 1194 deaths (25.78%) occurred in the KPCO-I cohort. In both the KPCO-I and KPCO-II cohorts, the rates of major bleeding and death, but not stroke, differed between warfarin users and non-users (Table 3). Also, as shown in S2 Fig, there was a significant difference in all-cause mortality (log-rank p-value = 0.001) between the KPCO-I cohort and KPCO-II cohort.
Significant trends for increasing stroke and major bleeding rates with higher CHA 2 DS 2 -VASc and HAS-BLED scores were found (p < 0.001) for both the KPCO-I and KPCO-II cohorts (S2 Table).

PLR Model
The PLR model included age, female sex, warfarin use, CHF, other cerebrovascular disease, hypertension, diabetes, prior major bleeding, prior stroke, renal disease, and concurrent use of antibiotics, antiplatelets, and GI drugs ( Table 4). Warfarin use was not associated with stroke (OR = 0.94, 95% CI: 0.66-1.34) but was associated with increased risk of major bleeding

Cox Model
The all-cause mortality model included age, warfarin, anemia, other cerebrovascular disease, CHF, diabetes, hypertension, prior major bleeding, malignancy, and concurrent use of antifungals and antidepressants ( Table 5). Warfarin use was associated with a decreased risk of death (HR = 0.55, 95% CI: 0.49-0.62). All other predictors were associated with increased risk of death except hypertension (HR = 0.76, 95% CI: 0.66-0.85).

Sensitivity Analyses
When warfarin use was treated as a time-dependent covariate, similar associations between warfarin and outcomes were found as in the PLR model for stroke and major bleeding and the Cox model for all-cause mortality (S3 Table). Results from the competing risk sensitivity analysis for stroke and major bleeding identified similar coefficients for all the predictors included in the PLR model, indicating the robustness of the prediction model ( Table 6).

Model performance and validation
The prediction models had a good fit to the data in the derivation cohort (p > 0.05) ( Table 7).
The discrimination of the models (AUC = 0.71 and 0.72 for stroke and major bleeding, respectively, and C index = 0.75 for all-cause mortality) were acceptable. The overall calibration of the PLR model ( S3 Fig and S4 Fig)   coefficients as the original models, indicating internal model validation ( Tables 4 and 5). Findings from 10-fold cross-validation also produced similar AUCs to the original PLR model: 0.69 for stroke and 0.71 for major bleeding ( Table 7). For external validation in the KPCO-II cohort, the models' intercepts and the regression coefficients were updated (S4 Table). Results of the model goodness-of-fit test ( Table 7), discrimination ( Table 7) and calibration (S6, S7 and S8 Figs supported external validation for the PLR and Cox models.

Discussion
In this study of patients diagnosed with AF who were and were not initiated on warfarin therapy, we present a new methodology to predict individual combined benefit and harm outcomes of warfarin therapy. We utilized a PLR model to predict the individual benefit and harm outcomes due to its simplicity and flexibility, especially in predictor selection [22,23]. The PLR modelling can incorporate individual baseline characteristics of patients to estimate individual probabilities of the combined benefit and harm outcomes. Compared with the decision tree model which is another commonly-used method for prediction building, the PLR models have shown greater discrimination and predictive accuracy [44][45][46][47][48][49]. We found that warfarin use, age, female sex, CHF, other cerebrovascular disease, hypertension, diabetes, prior major bleeding, prior stroke, renal disease, and concurrent use of antibiotics, antiplatelets, and GI drugs were included in the PLR model for stroke and major bleeding. Our model performance was acceptable and robust. Using the predictors we identified, the estimated probabilities of the potential outcomes can be computed. For example, if an 82 year-old woman taking warfarin had CHF, diabetes, renal disease and prior major bleeding, and used GI medications concurrently with warfarin, then her log(stroke/neither event) would be -0.85, and log(major bleeding/neither event) would be -0.33, respectively. Subsequently, her estimated 3-year probability of stroke would be: 1þe À0:85 þe À0:33 = 19.9%, her probability of major bleeding would be: 1þe À0:85 þe À0:33 = 33.6%, and her probability of neither event would be: 1 1þe À0:85 þe À0:33 = 46.5% [23]. By contrast if she did not start warfarin therapy but all other factors were the same, her estimated probability of stroke, major bleeding and neither event would be 24.3%, 22.5% and 53.2%, respectively. Likewise, her estimated 3-year probability of all-cause mortality with and without warfarin therapy initiation would be 6.9% and 24.4% respectively, using the Cox model.
In our prediction models, warfarin was associated with an increased risk of major bleeding and decreased risk of death, which is in accordance with previous findings [50,51]. However, we did not identify an association between warfarin use and decreased risk of stroke. A possible explanation for this unexpected observation might include lack of INR control measures, such as time in therapeutic range (TTR), in our prediction models. Prior research indicates that the full benefit of stroke risk reduction may require an individual TTR of at least 70% in warfarin users [52]. However, individual TTR results for patients in our cohorts could not be included in the models since warfarin non-users were unmeasured on this factor. Another possible explanation relates to our use of ICD-9-CM codes alone to identify stroke and bleeding outcomes without confirmatory chart review. The positive predictive values of ICD-9-CM codes for bleeding have been shown to be higher than those for stroke [53,54]; thus, the use of ICD-9-CM codes alone may have provided a high rate of stroke false positives. In addition, a stroke history may have increased the likelihood that a given patient received warfarin to prevent further stroke risk and concurrently increased the likelihood that false positive stroke ICD-9-CM codes were identified during administrative data acquisition. The CHADS 2 /CHA 2 DS 2 -VASc, and HAS-BLED scores are used worldwide in patients with AF to stratify the risk of stroke and major bleeding, respectively. However, these risk-stratification tools cannot provide the individual combined benefit and harm assessments needed by patients and physicians at inception of warfarin therapy or when concerns arise during ongoing use. Moreover, concerns have been expressed about their scoring algorithms and poor discrimination [55][56][57][58][59]. For instance, in one study compared with their peers with a CHA 2 DS 2 -VASc score of 0 and 1 for men and women, respectively, the unequal risk of stroke for the additional risk factors resulted in different weighting in the scoring algorithm. This corresponded to a HR of from 1.68 with vascular disease to 3.09 with an age of 65-74 years for men and a HR of from 1.71 with hypertension to 3.03 with an age of 65-74 years for women [57]. Therefore given the potential different weighting for individual components of the scores as well as more detailed information provided by the individual components, we used individual risk factors, rather than gross risk scores, in our model construction.
Other studies have used the 'net benefit' approach of considering stroke and major bleeding outcomes simultaneously [19][20][21]. Unfortunately, GI bleeding risk was not considered, and the weighting factor reflecting the importance of ICH was chosen subjectively and arbitrarily in these studies. Additionally, while some studies have combined stroke and bleeding risk-stratification scores to calculate overall clinical outcome risks including stroke and major bleeding [60,61], they did not improve prediction of stroke and major bleeding beyond the individual stroke (CHADS 2 , CHA 2 DS 2 -VASc) or bleeding scores (HAS-BLED) [62]. In contrast, our study may provide insights into using a new methodology to take into account individual benefit-harm outcomes with warfarin therapy. Our PLR model calculates the specific probabilities of stroke and major bleeding at the same time, which may be more practical and acceptable in real-world clinical practice compared with using separate stroke and bleeding risk-stratification scores. Moreover, because our model produces individualized risk estimates for each patient based on various characteristics, it offers more personalized and detailed information for patients with AF rather than the population-level estimates associated with CHADS 2 , CHA 2 DS 2 -VASc, and HAS-BLED scores [23]. Thus the PLR model may better facilitate patient-physician shared decision-making with regard to warfarin therapy initiation.
In our study, an unexpected inverse association between comorbid hypertension and stroke, major bleeding, and all-cause death was observed. During the model construction, we used either the ICD-9-CM codes or the antihypertensive drug surrogates including angiotensin-converting enzyme inhibitors, angiotensin II receptor blockers, thiazides, beta-blockers, calcium channel blockers, and other antihypertensive purchases, to identify hypertension comorbidity (S5 Table). Additionally, we ran two post-hoc sensitivity analyses using different methods to imply hypertension diagnosis: ICD-9-CM codes only, and both ICD-9-CM codes and antihypertensive drug purchases. These two methods yielded the same predictors included in the PLR and Cox model with extremely similar coefficients (S6 Table). Moreover, removing hypertension from the model entirely also yielded similar results (S7 Table for the PLR  model; S8 Table for the Cox model). Therefore, the unexpected relationship between hypertension and outcomes requires further exploration.
The strengths of our study include the use of a large sample of patients with AF to construct and validate the prediction model. Moreover, model building, assessment, and validation included rigorous and detailed statistical analyses. Another strength is the efforts in controlling bias in study design and data analyses to preclude misleading predictors from being included into the models. Nevertheless, our study also has several limitations. The majority of the data used in this study were from ICD-9-CM codes only without confirmatory chart review of the diagnosis. Thus data accuracy for baseline comorbidities may be less than optimal. Likewise, the incidence rates of stroke and major bleeding may be over-or underestimated. This could lead to false positive/negative values and weaken the findings based on the data. Additionally, we intended to predict four outcome quadrants ( Table 1). However, the number of patients experiencing simultaneous stroke and major bleeding (n = 28) was insufficient for model construction. Another limitation is lack of data from contemporary non-KPCO cohorts for model validation; thereby, potentially limiting the generalizability of the prediction model [27].

Conclusions
In this study, we introduce a new methodology for predicting individual combined benefit and harm outcomes associated with warfarin therapy for patients with AF. Should this approach be validated in other patient populations, it has potential advantages over existing risk stratification approaches as a patient-physician aid for shared decision-making.