Cardiovascular risk profile in individuals initiating treatment for overactive bladder – Challenges and learnings for comparative analysis using linked claims and electronic medical record databases

For managing overactive bladder (OAB), mirabegron, a β3 adrenergic receptor agonist, is typically used as second-line pharmacotherapy after antimuscarinics. Therefore, patients initiating treatment with mirabegron and antimuscarinics may differ, potentially impacting associated clinical outcomes. When using observational data to evaluate real-world safety and effectiveness of OAB treatments, residual bias due to unmeasured confounding and/or confounding by indication are important considerations. Falsification analysis, in which clinically irrelevant endpoints are tested as a reference, can be used to assess residual bias. The objective in this study was to compare baseline cardiovascular risk among OAB patients by treatment, and assess the presence of residual bias via falsification analysis of OAB patients treated with mirabegron or antimuscarinics, to determine whether clinically relevant comparisons across groups would be feasible. Linked electronic health record and claims data (Optum/Humedica) for OAB patients in the United States from 2011–2015 were available, with index defined as first date of OAB treatment during this period. Unadjusted characteristics were compared across groups at index and propensity-matching conducted. Falsification endpoints (hepatitis C, shingles, community-acquired pneumonia) were compared between groups using odds ratios (ORs) and 95% confidence intervals (CI). The study identified 10,311 antimuscarinic- and 408 mirabegron-treated patients. Mirabegron patients were predominantly older males, with more comorbidities. The analytic sample included 1,188 antimuscarinic patients propensity-matched to 396 mirabegron patients; after matching, no significant baseline differences remained. Estimates of falsification ORs were 0.7 (CI:0.3–1.7) for shingles, 1.5 (CI:0.3–8.2) for hepatitis C, 0.8 (CI:0.4–1.8) and 0.9 (CI:0.6–1.4) for pneumonia. While propensity matching successfully balanced observed covariates, wide CIs prevented definitive conclusions regarding residual bias. Accordingly, further observational comparisons by treatment group were not pursued. In real-world analysis, bias-detection methods could not confirm that differences in cardiovascular risk in patients receiving mirabegron versus antimuscarinics were fully adjusted for, precluding clinically relevant comparisons across treatment groups.


Introduction
Overactive bladder (OAB) is characterized by urge urinary incontinence and urgency, nocturia, and high urinary frequency. The prevalence of OAB has been estimated to be 11.8% in the United States (US), with higher rates in older individuals. [1] While behavioral modifications including bladder training, pelvic floor training, and limitation of fluids are intended as the first line of treatment for OAB, pharmacologic intervention is a mainstay of OAB management. [2] To date, antimuscarinic therapies-including oxybutynin, tolterodine, solifenacin, flavoxate, fesoterodine, trospium, or darifenacin-have been the most common first-line pharmacologic treatment for OAB. [3] Mirabegron (Myrbetriq/Betmiga; Astellas Pharma) is a β3 adrenergic receptor agonist with demonstrated efficacy and safety in managing the symptoms of OAB. [4] In clinical practice, mirabegron is typically given as second-line pharmacotherapy after discontinuation or failure of therapy with antimuscarinics. [3] While clinical trials have shown mirabegron to be both efficacious and safe, [4] in two randomized, placebo-controlled studies of healthy volunteers, mirabegron was associated with dose-related increases in supine blood pressure with the currently marketed and maximum recommended dose of 50 mg. [5] The mean increase in systolic blood pressure (SBP)/diastolic blood pressure (DBP) was approximately 3.5/1.5 mm Hg greater than placebo. In three, phase 3, 12-week, double-blind, placebo-controlled, safety and efficacy studies of OAB patients receiving mirabegron 25 mg, 50 mg or 100 mg once daily, mean increases in SBP/DBP of approximately 0.5-1.0 mm Hg were observed compared to placebo. [5] Both SBP and DBP increases were reversible upon discontinuation of treatment. [5] It is important to determine whether findings from randomized controlled studies are also observed in a real-world setting. In a real-world setting, integrated electronic health record (EHR) and claims data can also provide confirmation of dispensed-as opposed to prescribedmedications, as well as details of baseline cardiovascular risk profiles (e.g. vital signs, smoking status) that are not typically captured in billing claims but are required to inform any necessary statistical adjustments. To date, a real-world assessment of cardiovascular risk in OAB patients has not been conducted.
A key challenge in this setting is the potential for residual bias in observational data due to unmeasured confounders and/or confounding by indication, e.g. if patients receiving any OAB treatment are systematically different from patients receiving alternative therapies, or those who are untreated. Addressing residual bias is critical when using observational data to probe comparative outcomes. [6] In previous database analyses it was found that OAB patients initiating mirabegron tended to be older at treatment initiation, with a greater comorbidity burden and higher healthcare resource utilization, compared to those initiating treatment with antimuscarinics. [3] This is likely a result of mirabegron's typical positioning as a second-line pharmacological agent after failure of antimuscarinics. [3] The overarching aim of the study was, using the most appropriate statistical methodology for mitigating residual bias, to compare baseline cardiovascular risk profiles of OAB patients initiating antimuscarinics vs. mirabegron, and determine comparative cardiovascular outcomes, such as blood pressure change, between treatment groups. However, prior to undertaking that investigation, the team planned an unbiased a priori assessment of the feasibility of that comparative study, with a particular focus on determining whether potential residual bias could be present due to unmeasured confounding, including differences in prior treatment patterns and treatment switching. Should the initial feasibility assessment satisfy the study team that a rigorous study could be conducted, a larger outcome comparative analysis would then be undertaken. The objective of the study described here is to compare cardiovascular risk profiles of OAB patients initiating antimuscarinics vs. mirabegron; and to present the findings of the feasibility assessment for the comparative study of cardiovascular outcomes across treatment groups that the cardiovascular risk profiles informed.

Study design and patient population
The study was designed as a real-world, US-based retrospective cohort study of patients receiving treatment for OAB. Data were available from October 2011 to June 2015.
The overall study population was derived from all individuals diagnosed with OAB, based on the International Classification of Diseases-9 th Edition (ICD-9) codes that indicate a diagnosis of OAB (ICD-9 596.65,596.51,788.3,788. 31,788.33,788.41,788.43,788.63,788.91). While there is no specific ICD-9 code for OAB, the proposed ICD-9 codes above are consistent with previously published research that evaluated OAB patients using real-world datasets. [7][8][9][10] For the feasibility assessment, patients were eligible for inclusion based on dispensation billing records for mirabegron or an antimuscarinic therapy (oxybutynin, solifenacin, tolterodine, flavoxate, fesoterodine, trospium, darifenacin) between October 2012 and December 2014 ("identification period"). Both EHR records of prescriptions written and billing claims for dispensed prescriptions were initially considered for patient identification, however due to a large discrepancy indicating a high frequency of unfilled prescriptions (more than twice as many prescriptions were identified in the EHR versus the claims data), claims billing records were ultimately used to determine eligibility. Index date was defined as first prescription dispensation during the identification period. Patients with a diagnosis for OAB without a billing record for mirabegron or an antimuscarinic were included in the untreated cohort as of their first OABrelated health claim during the identification period. Data from October 2011 onwards were used to characterize medical and treatment history for each patient in the 12 months prior to index date. Follow-up included time from index date through June 2015 with a minimum potential follow-up of six months for those patients having an index date in December 2014. Follow-up times were censored for those individuals who died or left the claims database due to changing coverage. Patients were included in the mirabegron or antimuscarinic treatment group based on the first treatment they received during the identification period, noting that treatment switching after index date may have occurred, and that patients may have received other OAB therapies prior to the identification period.
Further eligibility criteria required that patients: • Had at least 12 months of continuous coverage in the claims data prior to index date in order to comprehensively assess risk factors and comorbidities at baseline; • Were 18 years of age or older and have at least one baseline blood pressure (measured and reported using methodologies per usual clinical practice) recorded in the EHR within 90 days prior to index date.
The criteria for exclusion were: • Pregnancy during the study period; • Received pharmacologic therapy for OAB during the year prior to the identification period but remained untreated for OAB during the identification period; • Had a recorded cardiovascular event (myocardial infarction, unstable angina, cardiovascular death, cerebrovascular accident, peripheral arteriopathy, aortic event, heart failure, coronary artery bypass grafting, atrial fibrillation, transient ischemic attack, percutaneous intervention, angioplasty) [11] within 30 days prior to the index date.
If the feasibility assessment indicated it reasonable to proceed with the cardiovascular outcomes study, the goal of those analyses would be to determine whether differences in SBP and/ or DBP changes occur in patients receiving mirabegron compared to antimuscarinics, and the association of those changes with cardiovascular events. It was therefore also important to perform sample size calculations a priori, to determine whether the available number of patients meeting the inclusion and exclusion criteria would be sufficient, if the outcome of the feasibility assessment to identify residual bias justified proceeding with the cardiovascular outcomes study. Details of the calculations are included in Appendix A. At power of 0.80, a sample of 500 mirabegron patients and 1500 antimuscarinic patients would be required to detect a systolic blood pressure difference of 2.5 mmHg assuming a 14 mmHg standard deviation. When based on a power of 0.90, the sample size increased to 645 mirabegron patients and 1935 antimuscarinic patients.
All analyses were conducted in SAS version 9.4.

Data source
The study utilized an Optum integrated claims billing and EHR dataset from the US. The Optum claims dataset is widely used for pharmacoepidemiologic, pharmacoeconomic, and outcomes research studies in a variety of diseases, [12,13] including cardiovascular diseases [14] and OAB. [15] The Optum dataset includes the eligibility, medical, and pharmacy claims data from United Health, a large commercial health plan affiliated with Optum. The individuals included within this health plan are geographically diverse, from across the US, comprising approximately 3 to 4% of the US population. The database includes data from 2003 to 2015 and has almost 13 million registrants annually. Optum claims data were integrated with Humedica primary care EHR data for the subset of the Optum OAB population included in both datasets. Inclusion in the Humedica EHR is based on physician participation in the network. Reported EHR data include medications, laboratory results, vital signs, demographics, hospitalizations, outpatient visits, and physician notes. Claims data linked to the EHR can be used to identify those prescriptions that were actually dispensed, indicating that identified patients filled a prescription for the study medication. Hence medication use in this study was defined by claims rather than EHR data. By linking the Humedica EHR data to Optum claims data, it is possible to identify prescriptions filled by patients, along with cost and charge amounts associated with all covered healthcare utilization.

Bias reduction
Initial unadjusted comparisons conducted for the feasibility assessment were descriptive in nature and statistical comparisons were not made across treatment groups. A propensity score analysis was undertaken to mitigate the effects of bias within the observational data source. A logistic regression model was fit to characterize the likelihood (i.e., propensity score) of an individual being in the mirabegron treatment group, while adjusting for a range of demographic (age, sex, ethnicity, health plan type, geographic region variables) and clinical (smoking status, BMI, blood pressure, cholesterol, cardiovascular history, comorbidities, concomitant medications) variables. It was anticipated a priori that there would be substantially more antimuscarinic patients than mirabegron patients available in the dataset and that an n:1 matching algorithm would make most efficient use of the available data. Based on exploratory review of the eligible population sizes, antimuscarinic patients were propensity score matched to mirabegron patients in a 3:1 manner using a greedy matching algorithm to form the analytic sample. [16] Quality assessment and falsification analysis An assessment of data quality and completeness was required prior to undertaking further analyses. Due to the nature of US health insurance, OAB patients can enter and leave the enrollment plan over time, whereby a hiatus in coverage could be observed in either one or both data sources. Quality assessment of the data focused on a test sample of individuals untreated for OAB and included: comparison of all blood pressure, cholesterol, age, and BMI against plausible ranges (overall and stratified by age <65 vs. � 65 and sex); and rates of missing values in variables included in the propensity score model. Within the treated cohorts, additional quality checks included tabulation of censoring from the cohort and reasons for drop out; assessment of overlap and gaps across EHR and claims data; and assessment of treatment switches across antimuscarinics and mirabegron following index date, which enabled characterization of the proportion of follow-up time that a mirabegron patient was exposed to antimuscarinics and vice-versa.
While numerous analytic methods are available to address confounding by indication, adequately minimizing bias may not be feasible, particularly in the case of unmeasured confounders. [17][18][19] Falsification analysis is a method that has recently been proposed for assessing the potential for residual bias in analyzing a specific research question in an observational data source. [20,21] Within a falsification analysis, an endpoint thought to be unrelated to the exposure of interest is pre-specified, and the association between this outcome and the exposure is tested after statistical adjustments have been made. Any spurious residual association observed between the exposure and the falsification outcome suggests that bias may be present within the data, and additional analyses are not recommended unless this bias can be explicitly addressed.
Falsification endpoints with no known association with either medication class under study were pre-specified by clinical experts, and included shingles (ICD-9 053 [22]), hepatitis C virus (ICD-9 070.44 [23]), and community-acquired pneumonia (ICD-9 480.x-486.x [24]). For each falsification endpoint, odds ratios and corresponding 95% confidence intervals (CIs) of the association between treatment and outcome were calculated for the mirabegron and antimuscarinic propensity-matched cohorts.

Results
To derive the study sample, 34,243 individuals (1,417 ever treated with mirabegron and 32,836 ever treated with an antimuscarinic) were initially identified for potential inclusion (Fig 1), based on a diagnosis of OAB and a billing record for a dispensed prescription at any point in time (e.g. not necessarily during the study period). It is of interest to note that more than twice as many patients had a record of a written prescription for mirabegron or an antimuscarinic in the EHR (data not shown) without a record of a prescription being dispensed in the claims data, perhaps indicating concerns with primary adherence and potential for bias in inducing differences across treatment groups. More than half of antimuscarinic patients (n = 17,426) and approximately one third of mirabegron patients (n = 847) were excluded because they were not dispensed a prescription during the study follow-up period (i.e. while these individuals received the medication of interest at some point during data coverage, they did not have a Cardiovascular risk profile in individuals initiating treatment for overactive bladder filled prescription during the study period). Of the resulting 15,980 patients (15,410 who received an antimuscarinic during the study period and 570 receiving mirabegron), most had 12 months' continuous data available prior to the index date in at least one of the EHR or claims databases, and were at least 18 years of age, with only a small number of exclusions related to these criteria (n = 16 exclusions for mirabegron and n = 740 exclusions for antimuscarinics). The requirement of a blood pressure measure being available within 90 days of index date resulted in 137 exclusions in the mirabegron arm and 4,163 exclusions in the antimuscarinic arm. A small number of exclusions were made due to recent pregnancy and/or cardiovascular events. After applying all inclusion and exclusion criteria, the final sample was reduced to 408 mirabegron patients and 10,311 antimuscarinic patients. The antimuscarinic group was then further reduced to create a 3:1 propensity-matched sample to mirabegron. After 3:1 propensity matching, the final sample size was 396 in the mirabegron group and 1,188 in the antimuscarinic group. Thus, of the 15,980 OAB patients who received a prescription during the study period, approximately ten percent were eligible for the final analytic study population.
During the quality assessment stage, baseline blood pressure measurements were available for most individuals in the untreated group (77.7%), with less than 0.5% of these data flagged as implausible or likely data entry errors. Women were more likely to have recorded blood pressure measures available (80.7% vs.73.2%), and there was no noted difference in blood pressure measure availability by age. Within the treated cohorts, both treatment cohorts had more than a year of follow-up on average (555 days for antimuscarinic patients, 456 days for mirabegron patients). A gap in coverage of at least 30 days was noted in 3.6% of antimuscarinic patients and 2.2% of mirabegron patients. Drop-out during the study period occurred in 34.1% of antimuscarinic patients (30.9% leaving plan, 3.2% death) and 21.1% of antimuscarinic patients (19.1% leaving plan, 2.0% death). Four percent of antimuscarinic patients switched treatment to mirabegron after index date, compared to 18.9% of mirabegron patients who had at least one antimuscarinic claim after index date. The result of the quality assessment process was a decision to proceed to the next phase of the study.
Baseline demographic characteristics prior to propensity matching, for both untreated and treated OAB patients, are shown in Table 1. Mirabegron patients tended to be older (mean age 70.1 years vs. 66.7, p<0.0001), were more likely to be male (33.6% vs. 26.8%, p = 0.01), Caucasian (88.7% vs. 85.9%, p = 0.24), and to be covered by supplementary Medicare (56.1% vs. 34.3%, p = 0.15), relative to antimuscarinic patients. The most notable difference between untreated and treated OAB patients was that the former group tended to be younger, with a mean age of 59.9 years, and 15.3% under age 40 years.
Baseline unmatched clinical characteristics are reported in Table 2. Most distributions were similar across treatment groups. Differences among mirabegron patients include a higher rate of prior major adverse cardiovascular events (13.5% vs 11.7% in antimuscarinic patients and 10.1% in untreated patients in the year prior to baseline, p = 0.2687), and a higher rate of diabetes mellitus (42.4% vs 34.8% in antimuscarinic patients and 26.3% in untreated patients, p = 0.0021).
The propensity score model included terms characterizing baseline demographics and clinical events, prior cardiovascular events, and pre-index blood pressure. After 3:1 propensity score matching, 396 of 408 mirabegron patients were matched to a corresponding sample of 1,188 antimuscarinic patients. The distributions of variables of interest before and after matching are reported in Table 3, and standardized differences before and after matching are shown in Fig 2. Prior to matching a number of variables were statistically different across the two treatment groups. Generally, mirabegron patients were more likely to have comorbidities, while antimuscarinic patients were more likely to be receiving concomitant medications. Propensity score matching was successful at reducing covariate imbalance across the treatment groups. After matching, no variables were statistically different between the two groups and all but one of the post-matching standardized difference was less than 0.10.     The results of the falsification analysis on the propensity-matched population are shown in Fig 3. Due to the relative infrequency of shingles and hepatitis C virus, they were combined post hoc into a composite endpoint. There was no statistical evidence of a spurious association between any of the falsification endpoints and mirabegron or antimuscarinic treatment (all 95% confidence intervals span 1.0). However, the confidence intervals were wide, particularly for hepatitis C (point estimate 1.5, 95% confidence interval 0.27-8.2). Thus, due in part to the low sample size contributing to a lack of power, the presence of residual bias (as evidenced by an odds ratio different than 1.0) in the propensity-matched dataset could not be ruled out.

Discussion
This study aimed to characterize the cardiovascular risk profile of untreated, mirabegrontreated, and antimuscarinic-treated OAB patients, using an integrated claims/EHR dataset. An incremental analytical approach was implemented to ensure rigor and accuracy within the constraints of observational data. In a retrospective cohort of individuals with OAB, mirabegron patients were found to be older, with more comorbidities and more prior cardiovascular events relative to antimuscarinic patients. Mirabegron is typically prescribed as a second-line agent to antimuscarinics, due to either inadequate response or poor tolerability of antimuscarinics, or due to formulary rules e.g. stepped therapy conditions. As such, unadjusted comparisons of treatment effectiveness or safety between mirabegron and antimuscarinics based on these data may be biased by differences in the distribution of baseline risk factors between treatment cohorts. Indeed, despite what appeared to be adequate propensity score matching, when residual bias was assessed through falsification analyses, the resulting confidence intervals were sufficiently wide that associations with falsification outcomes (as evidenced by wide   intervals) could not be ruled out. The small sample size ultimately eligible for study inclusiondue to the significant attrition caused by lack of overlap of subjects between the claims and EHR datasets, and application of the relatively limited exclusion criteria-contributed to limited power and to confidence interval width, preventing a definitive interpretation of results and conclusions. Bias assessment for the cardiovascular outcomes study, the initiation of which would be based on the results of the feasibility assessment, was defined with two pre-specified stopping rules with respect to data quality and study feasibility. The first stopping rule instructed no further analyses if there was evidence of sufficiently implausible data (i.e. clinical data values that were physically impossible and assumed to be data entry errors, assessed on a case-by-case basis) or frequency of missing data as assessed by the initial data quality assessment. The second stopping rule was a determination of feasibility after propensity-score matching, based on achievement of covariate balance, available sample size, and results of falsification analysis. No assessment of primary or secondary endpoints for the cardiovascular outcomes study was conducted prior to confirming the results of the stopping rules, to ensure that outcome values had no influence on the decision to continue the study. In accordance with the second analytic stopping rule, this, together with the potential for residual bias identified by the falsification analyses, led to the conclusion that there was insufficient evidence to rule out bias in the available data, and that the cardiovascular outcomes study could not be robustly carried out at this time. Future real-world administrative database studies assessing clinical outcomes across OAB treatments will require careful accounting for the clinical differences inherent in comparing populations receiving a second-line agent to a first-line agent beyond that achieved by standard methods such as propensity scoring; data sources with larger sample sizes available may mitigate the limitations caused by low power observed here. By planning the cardiovascular outcomes study with pre-specified stopping rules, primary and secondary endpoint results data were not analyzed in any way prior to making stopping decisions for the study. This ensured information about results could not influence the decision of whether to proceed with further analyses.
Electronic health records have been used previously to assess blood pressure outcomes, [14,[25][26][27][28][29] and are acknowledged by the US Food and Drug Administration for use in prospective clinical investigations of medications. [30] Integrated claims and EHR datasets can also be valuable research tools to assess clinical and pharmacoepidemiologic questions due to large sample sizes, long follow-up durations, and the inclusion of patients with complex medical needs who would not be likely candidates for clinical trial participation and cannot feasibly be assessed within an RCT framework. Indeed, in the search for robust real-world data, claims and EHR data sources are often cited as a powerful source of evidence, and the availability of large sample sizes is a frequently-noted feature. [31] That said, as reported here, rigorously identifying the study cohort and appropriately controlling for potential biases may impact the feasibility of using these data for complex clinical and pharmacoepidemiologic questions. Indeed, this study highlights that even for a relatively common condition such as OAB (with a study requirement for recorded claims data), sample size remained a limiting factor within a large administrative dataset.
With respect to the potential for selection bias in observational data sources, supplementing propensity scores with statistical techniques such as falsification analysis (also referred to as negative control analysis), [32] can help to assess whether an unbiased comparison is feasible with the available data, and can provide confidence in the interpretation of results. However, identification of appropriate endpoints can be challenging; in the case of OAB, the potential for an association between exposure and cardiovascular risk factors limits the availability of possible falsification endpoints with no plausible chance of association to treatment group. As a result, although expected frequency was a criterion used when selecting candidate falsification endpoints, the sample size was inadequate to conduct a conclusive falsification analysis. This highlights the importance of a large sample size, not only for attaining sufficient statistical power required for primary analyses, but also for bias assessment techniques. The methods described here may be used as a framework for other investigators who are considering realworld data to investigate other clinical and pharmacoepidemiologic research questions.
While data quality checks did not identify any notable concerns regarding data quality or conclusive evidence of residual bias, the magnitude of variability in falsification analysis results could not conclusively rule out the potential for residual bias in the sample after statistical adjustment. The most important concern with the data, eventually leading to the decision to not proceed with analysis, was the small sample of eligible mirabegron patients, including the attrition induced by individuals who were prescribed therapy but did not fill their prescription. In particular, the low number of patients relative to available mirabegron clinical trial populations, for which a similar length of follow-up is available for samples of 400-800 patients, [33][34][35][36] did not justify the potential for additional biases associated with observational research. It was notable that, of patients identified as receiving mirabegron or an antimuscarinic in EHR data, more than half were excluded for not having corresponding claims records. This may reflect problems with primary adherence to OAB medications, with patients choosing not to access prescribed medications; however, this feature of the data also highlights a potential limitation with linked EHR and claims data generally, in that dispensed prescriptions may not be reflected in claims data due to plan discontinuation and/or other changes in coverage.
While the relatively small sample size of mirabegron patients coupled with the low prevalence of the falsification endpoints of interest did not directly lead to the decision to stop the cardiovascular outcomes study, it may have contributed to the interpretation that falsification endpoints were unable to definitively rule out residual bias, as determined by the wide confidence intervals rather than point estimates. As such, a similar analysis in a larger sample may have been found sufficient to warrant continuation with the cardiovascular outcomes study. A recently-conducted US Food and Drug Administration mini-sentinel study acknowledged similar limitations; they did not find a difference in risk of acute myocardial infarction or stroke between new users of mirabegron vs. oxybutynin, while noting limitations due to available follow-up time and sparseness of outcomes. [37] While long-term follow up of the real-world safety and effectiveness profiles of mirabegron and antimuscarinics warrants further consideration, the limited scope of available data is challenging for comparative analyses. The methodology presented here describes a framework for any treatment comparative analysis using real world observational data where treatment attributes may be at risk of residual confounding bias.