A Comparison of Cost Effectiveness Using Data from Randomized Trials or Actual Clinical Practice: Selective Cox-2 Inhibitors as an Example

Tjeerd-Pieter van Staa and colleagues estimate the likely cost effectiveness of selective Cox-2 inhibitors prescribed during routine clinical practice, as compared to the cost effectiveness predicted from randomized controlled trial data.


Introduction
Many countries require health technology assessments when deciding on adopting new healthcare technologies. Recently, the American College of Physicians recommended the establishment of an organization for the generation and review of costeffectiveness analyses [1]. In England and Wales, formal costeffectiveness analyses are now required and several years ago the National Institute for Health and Clinical Excellence (NICE) was established to balance the financial costs and clinical benefits of health technologies and evaluate their cost effectiveness [2,3]. It would be of interest to evaluate the experience in England and Wales and evaluate whether previous cost-effectiveness analyses adequately informed and guided medical practice.
Selective cyclooxygenase-2 inhibitors (coxibs) ranked, before September 2004, among the most commonly used medications in the world. They were developed to minimize the upper gastrointestinal (GI) side-effects of conventional nonsteroidal anti-inflammatory drugs (NSAIDs). There have been at least 33 published studies that evaluated the cost effectiveness of coxibs (celecoxib, rofecoxib, etoricoxib, or lumiracoxib) relative to that of conventional NSAIDs . Although the use of coxibs has now changed following the findings of cardiovascular harm [37], they provide a good example of a drug with recently published costeffectiveness analyses that were used to inform prescribing policies [8,36]. Randomised clinical trial (RCT) data were used for the estimates of the rates of the upper GI events in all cost-effectiveness studies, except those conducted prior to the completion of large RCTs [4][5][6][7]. RCT data are still widely used not only for efficacy estimates but also for costs and incidence estimates [38][39][40]. While RCTs undoubtedly provide the best evidence for efficacy, they may not be the best source of costing data [41]. In addition, it is unclear whether RCT estimates on the incidence of outcomes represent the experience of patients in actual clinical practice [42]. However, there has been little empirical investigation of these issues. The objective of this study was to evaluate the external validity of published cost-effectiveness studies by comparing the data used in these studies to observational data from actual clinical practice and whether these studies should have been used to inform prescribing policies. Coxibs were used as an example.

Design of the Cost-Effectiveness Model
A basic cost-effectiveness model was developed evaluating two alternative strategies: prescription of a conventional NSAID or coxib. The model estimated the incremental cost of preventing one upper GI event with coxibs in a large representative UK population that had been prescribed anti-inflammatory medication during 1990-2006 for any medical condition. The prescriptions costs and the number of cases with upper GI events during current exposure to coxibs were compared in a simulation model to those with conventional NSAIDs.

Risks of Upper GI Events
The upper GI events included clinically symptomatic gastroduodenal ulcers and complications such as upper GI hemorrhage. Two data sources were used to estimate the risks of upper GI events. Firstly, data were derived from existing RCTs. All published cost-effectiveness analyses conducted since 2000 used RCT data for the estimates of the risks of upper GI events . Literature was searched for large RCTs (including over 2,000 patients) or meta-analyses of RCTs with prevention of upper GI events as primary outcome. A total of 11 large RCTs or meta-analyses was identified [43][44][45][46][47][48][49][50][51][52][53]. Secondly, data from actual clinical practice were used to estimate the absolute risk of upper GI events among patients using NSAIDs and coxibs. All patients aged 40 y or older prescribed conventional NSAIDs or coxibs and registered in the General Practice Research Database (GPRD) were identified. The GPRD comprises the anonymized computerized medical records of general practitioners (GPs). GPs play a key role in the UK health care system, as they are responsible for primary health care and specialist referrals. Patients are affiliated to a practice, which centralizes the medical information from the GPs, specialist referrals, and hospitalizations. The data recorded in the GPRD include demographic information, prescription details, clinical events, preventive care provided, specialist referrals, and hospital admissions and their major outcomes [54]. GPRD data collection started in 1987 and currently includes data on over 10 million patients. Two outcomes were measured and considered separately in the analyses. The first outcome concerned a GPRD record of upper GI events (as based on a GP diagnosis or based on a hospital or consultant letter as recorded into GPRD by the GP). The second outcome concerned hospitalizations for upper GI events, as obtained from the national registry of hospital admissions in England (Hospital Episode Statistics). Each hospital records the dates of admission and discharge and diagnoses of all hospitalizations (data from 2001 to 2006 were used). These hospital data can now be linked individually and anonymously to patients in English GPRD practices. The hospitalizations for upper GI events included the ICD-10 codes for gastric, duodenal, peptic, or gastrojejunal ulcer and gastritis or duodenitis (K25-K29).
The GPRD study population was followed from the first NSAID prescription to the patient's death, patient's transfer out of the general practice, or the last GPRD data collection available for this study (first quarter of 2006), whichever date came first. The follow-up of the study population was divided into periods of current and past exposure, with patients moving between these exposures. Current exposure was the time-period starting at the date of a prescription up to 3 mo after the end of the prescription duration. On average, prescriptions for conventional NSAIDs and coxibs provided for a treatment of 28 d. Past exposure was the remaining time of the follow-up period of a patient (i.e., the time distant from a prescription). In this population, the incidence rates of upper GI events (i.e., the number of cases per 1,000 personyears) were estimated during current and past exposure overall and by age, gender, exposure characteristics, and GI risk factors. Poisson regression was used to estimate the relative risk (RR) of upper GI events during current compared to past exposure. All these analyses were done separately for conventional NSAIDs and coxibs. In the analysis of conventional NSAIDs, patients were censored at the first coxib prescription.

Exposure Characteristics
The published cost-effectiveness studies estimated the cost effectiveness for daily treatment for continuous periods of time . The large RCTs all evaluated long-term NSAID exposure (ranging from 3 mo to 3 y) in patients with either rheumatoid arthritis (RA) or osteoarthritis (OA), requiring chronic or continuous NSAID therapy for the duration of the trial.
The longitudinal prescription histories in GPRD were used to determine the exposure characteristics (daily or intermittent and short-or long-term use). The medication possession ratio (i.e., the proportion of time covered by medication use) was estimated for each NSAID prescription that had a prior prescription in the 6 mo before. The medication possession ratio was the expected duration of NSAID exposure of the previous prescription divided by the time from between these two prescriptions. Prescriptions that were issued at least 6 mo after the previous NSAID prescriptions were classified as exposure with long gaps.
First-time exposure was the first NSAID prescription issued at least 1 y after start of GPRD data collection. At each NSAID prescription, the number of NSAID prescribed in the 1 y before was also calculated approximating the prior exposure duration (short-term, #4; medium-term, 5-11; and long-term exposure, $11 prior prescriptions). Prescriptions with missing information on the expected duration of use were classified into a separate category.
In the UK, ibuprofen is available over the counter without prescription. Patients need to pay a charge for GP prescriptions, except elderly and patients with low incomes. Further details on the prescribing patterns of conventional NSAIDs and coxibs can be found elsewhere [55,56].

Risk Factors for Upper GI Events
In the GPRD population, the GI risk factors were estimated at each prescription, including age of 65 y or older, recent prescribing in the 6 mo before of oral glucocorticoids, or anticoagulants, and a history of peptic upper GI bleeding, ischemic heart disease, hypertension, heart, renal or liver failure, or diabetes mellitus. These risk factors were included in NSAID prescribing guidelines from NICE [57]. Additional upper GI risk factors measured in this study included calendar year, the number of visits to the GP in the 6 to 12 mo before, smoking history and use of alcohol and body mass index (where available), medical history of OA or RA, and concomitant prescribing of aspirin or gastro-protective (ulcerhealing) drugs (British National Formulary 1.3).

Clinical Effects of Coxibs
In order to derive an estimate of the beneficial effects of coxibs on the risk of upper GI events, a meta-analysis of 11 RCTs was used. This meta-analysis reported a relative risk reduction (RRR) of 51% of clinically symptomatic ulcers with coxibs (RR of 0.49; 95% confidence interval [CI] 0.38-0.62) [58]. We assumed in the simulation model that the risk of upper GI events, as observed in GPRD in users of conventional NSAIDs, would have been reduced by 51% if a coxib had been prescribed. Conversely, we assumed that the risk during current coxib exposure in GPRD would have increased by 51%, if a conventional NSAID had been prescribed. In the main analysis, it was assumed that the risk reduction due to coxibs would start immediately, similar to the assumptions in the published cost-effectiveness studies . As several RCTs reported an onset of coxib effect only 1 to 6 mo after starting exposure [45,46,52,53] (i.e., diverging of the risks between the coxib and control groups), a sensitivity analysis was conducted assuming a delayed onset of effect (after 1 or 6 mo).

Prescription Costs
Prescription costs of each NSAID and coxib prescription in GPRD were estimated using the prescribed number of tables and the 2006 cost data from the British National Formulary. The cost data were converted from British pounds into US dollars using an exchange of »1 to US$2 (approximately the exchange rate at the end of 2006). As prescription costs varied substantially and the use of a single cost difference would be incorrect, the prescriptions of conventional NSAIDs and coxibs were ranked by costs and the incremental cost was based on the cost difference at each rank between conventional NSAIDs and coxibs. In a sensitivity analysis, the cost estimates from a recent UK assessment report were used (US$5.60 per month for a conventional NSAID and US$41.28 for a coxib) [31].

Simulation Model
Simulation methodology was used to estimate the incremental cost of preventing one upper GI event during current exposure to coxibs. The number of upper GI cases avoided by coxibs was based on the RRR of the drug effect and the patient-specific incidence of upper GI events as estimated in the Poisson regression. The random variability was determined as follows. The event probabilities were randomly selected from a normal distribution on the basis of its mean and standard deviation. The coxib RRR used in each simulation was randomly selected from a normal distribution based on the RRR and 95% CI reported in literature [58]. The simulation was repeated 250 times and nonparametric bootstrapping techniques were then used to estimate the 95% CIs (i.e., the 2.5% and 97.5% percentiles) [59]. Table 1 shows the rate of upper GI events in the large RCTs of coxibs. Study patients were restricted to those who required longterm NSAID exposure and the indication for treatment was mostly OA or RA. Both the CLASS and VIGOR studies did not apply ''intention to treat'' statistical analyses, but restricted the analyses to events that occurred during treatment or within 14 d of discontinuation of treatment.

Results
The GPRD study population included 971,426 patients prescribed conventional NSAIDs and 148,592 prescribed coxibs. A medical history of RA or OA was present in 23.0% of the conventional NSAID users and 45.9% of the coxib users. They were given a total of 8.5 million conventional NSAID prescriptions and 0.9 million coxib prescriptions. The longitudinal prescription histories indicated that a large proportion of patients used the NSAIDs intermittently. Only about 34.5% of conventional NSAID and 44.2% of coxib prescriptions were given to patients with enough medication for longer term daily exposure ( Table 2). The RRs of upper GI events during current exposure (compared to past exposure) were higher in those with continuous NSAID exposure and lower with incidental exposure. As shown in Table 3, the rate of upper GI events (as recorded by the GP) and of upper GI hospitalizations during current exposure to conventional NSAIDs decreased over calendar time by 5%-8% per year (pvalue for tests of linear trend ,0.0001 and 0.04, respectively). The rate of upper GI hospitalizations during current exposure to conventional NSAID users in GPRD was 12-fold lower than the rate reported in the VIGOR RCT (3.8 and 45.0 per 1,000 personyears, respectively).
The mean cost of a conventional NSAID prescription was US$17.80 (range of US$4.56 at 5th percentile to US$47.36 at 95th percentile). For coxibs, the mean cost was US$47.04 (range from US$18.62 to US$83.96). The mean incremental cost of replacing a conventional NSAID with a coxib was US$29.24. The mean cost of preventing one clinical upper GI event by substituting the conventional NSAID by a coxib was US$104k (95% CI US$74-146k) using GPRD estimates for the risk of upper GI events ( Table 4). The cost effectiveness varied substantially by calendar year and exposure characteristics ( Figure 1). As shown in Table 4, there was a large heterogeneity across the study population in the costs of preventing one upper GI event. In patients with two or more upper GI risk factors, 71.9% of the prescriptions had a cost below US$100k per case avoided in long-term users while 36.6% in intermittent users (with long gaps).
The cost-effectiveness estimates worsened with a delayed coxib effect (Table 5). Conversely, the cost effectiveness of coxibs improved substantially when using RCT data for the risk of upper GI events (the mean cost was US$20k using the CLASS RCT [46] and US$16k using the VIGOR RCT [45]).

Discussion
Health technology assessments frequently use data from randomized trials for estimates of absolute risks of events and patterns of drug use. Using coxibs as an example, we have shown that cost-effectiveness analyses produced markedly different results depending on the source of the data used in the modeling. The cost effectiveness of coxibs was far worse when the analyses were based on data from actual clinical practice rather than RCTs. The use of data from actual clinical practice rather than RCTs would have radically altered the conclusions of health technology appraisals of coxibs.
There are several reasons for the substantive differences in results using actual clinical practice or RCT data. The incidence of upper GI events was lower among patients in GPRD compared to those in RCTs. In GPRD, there was an almost 3-fold reduction over calendar time in the incidence of upper GI events. This  # Each NSAID prescription was classified according to first-ever use, long gap (previous prescription at least 6 mo before), and short gap (previous prescription within the last 6 mo). The medication possession ratio was estimated for the prescriptions issued after a short gap and divided into very low (,0.40), low (0.40-0.59), moderate (0.60-0.79), and high (0.80+). Short-term use was defined #4 prescriptions in the 1 y before, medium-term 5-11, and long-term $11 prior NSAID prescriptions. Rx, prescription. doi:10.1371/journal.pmed.1000194.t004 secular trend is consistent with that observed in Canada for the rate of hospital admission for upper GI events [60]. Furthermore, the cost-effectiveness analyses evaluated long-term daily use of coxibs in patients with RA or OA, while most patients in actual clinical practice did not have these conditions or used NSAIDs intermittently or for short periods of time. A further difference in the results of cost effectiveness may be related to the assumptions for prescription costs. Single estimates for costs were used in published cost-effectiveness models, while in daily practice there is a substantive variability in prescription costs for NSAIDs. Lastly, the published coxib cost-effectiveness studies described simple scenarios of drug exposure and event probabilities assuming Table 5. Sensitivity analyses of the population mean of the cost per case avoided with coxibs using different assumptions for onset of coxib effect and event probabilities.  uniformity in the population, while this study found a huge variability between patients in type of NSAID exposure, incidence of upper GI events, and prescription costs. In this study, a large proportion of the patients with a major upper GI risk factor, recommended to be treated with coxibs in the UK [57], had a cost per upper GI event avoided in excess of US$100k. The best strategy for targeting coxibs cost-effectively to heterogeneous populations has not yet been established. The use of coxibs has now changed following the findings of cardiovascular harm [37]. This study did not address the appropriate prescribing of coxib on the basis of our current understanding of these cardiovascular risks.
RCTs provide the best evidence for establishing the efficacy (relative effects) of a treatment and have high internal validity due to randomization and blinding. But randomization and blinding do not ensure that the absolute event probabilities and costs, as observed in a RCT, will represent those in actual clinical practice and that RCTs have external validity. The ''real world'' includes an incredible diversity and complexity [61], while the ''world of RCTs'' applies strict criteria for patient inclusion and for treatment exposure. RCTs often have an artificial design, with more tests conducted and increased patient monitoring. Also, patients may not comply with treatment instructions particularly well in the ''real'' world, increasing costs and decreasing the benefits. Thus, the absolute figures obtained from a RCT may very well deviate from and not represent the ''real world.'' On the other hand, observational studies may provide reasonably good estimates of absolute event probabilities and costs in patients in actual clinical practice, but have major limitations in attributing causality and estimating the relative effects of a drug treatment, principally owing to confounding. Rather than considering RCTs as the ideal evidence for all information, cost-effectiveness studies could use could use meta-analyses of RCT data for the drug effect estimates and observational data for the absolute event probabilities and costs [62]. In addition to providing a better context, this approach would also limit the possibility that the best RCT data are selected for the cost-effectiveness analyses [63]. An alternative and even better approach would be to use large pragmatic RCTs for cost-effectiveness models. Pragmatic RCTs are conducted with patients who represent the full spectrum of the population to which the treatment might be applied and with interventions that have real-life (rather than ideal) compliance [64].
Cost-effectiveness analyses that are intended to guide medical practice should consider the characteristics of all possible patient subgroups that may be provided with the new technology. As an example, the prevalence of risk factors, the incidence of upper GI events, and the exposure characteristics of conventional NSAID users in actual clinical practice could have been described prior to assessing the cost effectiveness of coxibs. Such an analysis would have noted the selective characteristics of the patients enrolled in the large coxib RCTs and differences in exposure characteristics. Few patients in GPRD used conventional NSAIDs in the manner as tested in the coxib RCTs (i.e., long-term use with higher daily doses). Patients may not require regular treatment, may not comply with dosage instructions, or persist with treatment. A second consideration for costeffectiveness studies should be to evaluate the extent that RCT evidence can be generalized and extrapolated to each of these various patient subgroups that may be provided with the new technology in actual clinical practice. As an example, it would have been noted that most conventional NSAID users would not have been eligible for inclusion into the large coxib RCTs and that there is rather limited evidence for beneficial effects of coxibs with short-term or intermittent use (as done by most patients). While it may be impossible to conduct RCTs in patients who use a treatment intermittently or who comply less (because of the required sample size), the uncertainty in generalizing RCT efficacy estimates to populations more diverse in patient and treatment characteristics should be considered explicitly [65]. None of the 33 published coxib cost-effectiveness studies analysed the external validity of the assumptions used . They did not provide any guidance on the prescribing of coxibs to the majority of patients using conventional NSAIDs in actual clinical practice (e.g., those with short-term or intermittent use). The field of health technology assessments should move from evaluating cost efficacy in ideal (hypothetical) populations with ideal interventions to cost effectiveness in real populations with pragmatic interventions.
One of the key limitations of this study was that the classification of upper GI events may have differed between RCTs and GPRD/ Hospital Episode Statistics. In most of the large RCTs, all potential upper GI events were adjudicated in a standard manner. In the CLASS celecoxib RCT, only one-third of the potential cases were included in the analysis [46]. GPRD is based on information diagnosed and collected in actual clinical practice. This lack of case adjudication may have overestimated the rate of upper GI events in GPRD. On the other hand, there may have been under-diagnosis and/or under-recording in GPRD. However, clinically significant events are generally well recorded in GPRD, as documented by various validation studies [54]. Specifically, the validity of the diagnosis of upper GI bleeding in the GPRD records was assessed in a sample of 96 people with a diagnosis of upper GI bleeding recorded in their electronic records. Hospital records were reviewed and the diagnosis confirmed in 95 out of the sample of 96 patients [66].
In conclusion, the coxib cost-effectiveness studies lacked external validity and more realistic estimates for event rates and costs could have produced markedly different results, sufficient to have led to different prescribing guidelines. External validity should be an explicit requirement for cost-effectiveness analyses.

Editors' Summary
Background. Before a new treatment for a specific disease becomes an established part of clinical practice, it goes through a long process of development and clinical testing. This process starts with extensive studies of the new treatment in the laboratory and in animals and then moves into clinical trials. The most important of these trials are randomized controlled trials (RCTs), studies in which the efficacy and safety of the new drug and an established drug are compared by giving the two drugs to randomized groups of patients with the disease. The final hurdle that a drug or any other healthcare technology often has to jump before being adopted for widespread clinical use is a health technology assessment, which aims to provide policymakers, clinicians, and patients with information about the balance between the clinical and financial costs of the drug and its benefits (its cost-effectiveness). In England and Wales, for example, the National Institute for Health and Clinical Excellence (NICE), which promotes clinical excellence and the effective use of resources within the National Health Service, routinely commissions such assessments.
Why Was This Study Done? Data on the risks of various outcomes associated with a new treatment are needed for cost-effectiveness analyses. These data are usually obtained from RCTs, but although RCTs are the best way of determining a drug's potency in experienced hands under ideal conditions (its efficacy), they may not be a good way to determine a drug's success in an average clinical setting (its effectiveness). In this study, the researchers compare the data from RCTs that have been used in several published cost-effectiveness analyses of a class of drugs called selective cyclooxygenase-2 inhibitors (''coxibs'') with observational data from actual clinical practice. They then ask whether the published cost-effectiveness studies, which generally used RCT data, should have been used to inform coxib prescribing policies. Coxibs are nonsteroidal anti-inflammatory drugs (NSAIDs) that were developed in the 1990s to treat arthritis and other chronic inflammatory conditions. Conventional NSAIDs can cause gastric ulcers and bleeding from the gut (upper gastrointestinal events) if taken for a long time. The use of coxibs avoids this problem.
What Did the Researchers Do and Find? The researchers extracted data on the real-life use of conventional NSAIDs and coxibs and on the incidence of upper gastrointestinal events from the UK General Practice Research Database (GPRD) and from the national registry of hospitalizations. Only a minority of the million patients who were prescribed conventional NSAIDs (average cost per prescription US$17.80) or coxibs (average cost per prescription US$47.04) for a variety of inflammatory conditions took them on a long-term daily basis, whereas in the RCTs of coxibs, patients with a few carefully defined conditions took NSAIDs daily for at least 6-9 months. The researchers then developed a cost-effectiveness model to evaluate the costs of the alternative strategies of prescribing a conventional NSAID or a coxib. The mean additional cost of preventing one gastrointestinal event recorded in the GPRD by using a coxib instead of a NSAID, they report, was US$104,000; the mean cost of preventing one hospitalization for such an event was US$298,000. By contrast, the mean cost of preventing one gastrointestinal event by using a coxib instead of a NSAID calculated from data obtained in RCTs was about US$20,000.
What Do These Findings Mean? These findings suggest that the published cost-effectiveness analyses of coxibs greatly underestimate the cost of preventing gastrointestinal events by replacing prescriptions of conventional NSAIDs with prescriptions of coxibs. That is, if data from actual clinical practice had been used in cost-effectiveness analyses rather than data from RCTs, the conclusions of the published cost-effectiveness analyses of coxibs would have been radically different and may have led to different prescribing guidelines for this class of drug. More generally, these findings provide a good illustration of how important it is to ensure that cost-effectiveness analyses have ''external'' validity by using realistic estimates for event rates and costs rather than relying on data from RCTs that do not always reflect the real-world situation. The researchers suggest, therefore, that health technology assessments should move from evaluating cost-efficacy in ideal populations with ideal interventions to evaluating costeffectiveness in real populations with real interventions.  N Wikipedia has pages on health technology assessment and on selective cyclooxygenase-2 inhibitors (note that Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)