Appropriate Use of Cardiac Stress Testing with Imaging: A Systematic Review and Meta-Analysis

Background Appropriate use criteria (AUC) for cardiac stress tests address concerns about utilization growth and patient safety. We systematically reviewed studies of appropriateness, including within physician specialties; evaluated trends over time and in response to AUC updates; and characterized leading indications for inappropriate/rarely appropriate testing. Methods We searched PubMed (2005–2015) for English-language articles reporting stress echocardiography or myocardial perfusion imaging (MPI) appropriateness. Data were pooled using random-effects meta-analysis and meta-regression. Results Thirty-four publications of 41,578 patients were included, primarily from academic centers. Stress echocardiography appropriate testing rates were 53.0% (95% CI, 45.3%–60.7%) and 50.9% (42.6%–59.2%) and inappropriate/rarely appropriate rates were 19.1% (11.4%–26.8%) and 28.4% (23.9%–32.8%) using 2008 and 2011 AUC, respectively. Stress MPI appropriate testing rates were 71.1% (64.5%–77.7%) and 72.0% (67.6%–76.3%) and inappropriate/rarely appropriate rates were 10.7% (7.2%–14.2%) and 15.7% (12.4%–19.1%) using 2005 and 2009 AUC, respectively. There was no significant temporal trend toward rising rates of appropriateness for stress echocardiography or MPI. Unclassified stress echocardiograms fell by 79% (p = 0.04) with updated AUC. There were no differences between cardiac specialists and internists. Conclusions Rates of appropriate use tend to be lower for stress echocardiography compared to MPI, and updated AUC reduced unclassified stress echocardiograms. There is no conclusive evidence that AUC improved appropriate use over time. Further research is needed to determine if integration of appropriateness guidelines in academic and community settings is an effective approach to optimizing inappropriate/rarely appropriate use of stress testing and its associated costs and patient harms.


Introduction
Cardiac imaging has advanced physicians' ability to diagnose and treat a variety of diseases, but rapid growth in the utilization and cost of imaging technology has spurred public and private insurers to scrutinize its use and construct policies aimed at reducing imaging expenditures. [1][2][3] Professional society organizations and clinical researchers have also taken steps to better characterize the value of cardiac imaging, [4][5][6] while also highlighting clinical scenarios under which imaging use is particularly low-value and unlikely to improve patients' health or management. While the Choosing Wisely campaign is perhaps the most widely recognized of these professional efforts to self-regulate use of low-value tests and procedures, it was preceded and informed, in part, by the American College of Cardiology's (ACC) development of appropriate use criteria (AUC) for cardiac imaging stress tests. [7] These AUC have expanded to inform the use of a variety of imaging studies and invasive procedures, but cardiac stress testing has been a focal point of attention, largely due to its wide dissemination, [2] radiation risks, [8] procedural risks, expense, and association with downstream testing and procedures-some of which are invasive. [9] However, until recently, little was known about the potential long-term impact of the ACC's appropriate use criteria on clinical decision-making in patients evaluated for ischemic heart disease. [10] We aimed to (1) systematically review studies of cardiac stress testing appropriateness, including appropriateness within physician specialties; (2) evaluate trends over time and in response to updates of AUC; and (3) characterize leading indications for inappropriate/rarely appropriate testing.
While a recent meta-analysis provided important insights into trends in appropriateness across several cardiac imaging modalities, [10] our study differs from this prior work in important ways: we include a greater number of published studies, report a wider range of information about patients characteristics in each study, provide information about indications for inappropriate/rarely appropriate testing, perform more robust analyses of appropriateness by physician specialty (we use both meta-regression and meta-analysis to compare cardiac specialists and internists), and apply a more rigorous method for evaluating temporal trends (we pooled more studies and adjusted for AUC version). Simply stated, we add a more methodologically rigorous meta-analysis to the literature on cardiac imaging appropriateness.

Search Strategy
We searched PubMed (which includes the MEDLINE database and other sources) from October 1, 2005 to March 1, 2015 for English-language articles reporting stress echocardiography and radionuclide myocardial perfusion imaging (MPI) appropriateness. Our search terms included the Medical Subject Headings exercise test, Cardiac Imaging Techniques, myocardial perfusion imaging, single photon emission computed tomography, and echocardiography; keywords identifying cardiac imaging stress tests, including stress test, thallium, sestamibi, Technetium, myocardial perfusion, MPI, SPECT, and echo; and keywords identifying appropriateness evaluations, including approp Ã (for "appropriate" and variants), and inapprop Ã (for "inappropriate" and variants). We identified additional publications through discussion between collaborators. Our report adheres to guidelines for systematic reviews recommended by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement and Metaanalysis of Observational Studies in Epidemiology (MOOSE) group (see Supplemental materials).

Study Selection
Two investigators (J.L. and S.B.), working independently, in duplicate, identified studies eligible for further review after screening titles or abstracts. Studies then underwent full-text retrieval and data extraction if authors reported rates of appropriate or inappropriate cardiac stress testing based on published AUC. Studies were ineligible for inclusion if they focused on special populations (e.g., transplant candidates) whose clinical characteristics made them less representative of general populations undergoing cardiac stress testing, though we did include one study that enrolled only patients with acute chest pain. [11] When multiple studies reported appropriateness outcomes on identical or overlapping populations, only studies that reported unique outcomes were included (see Table 1 footnote for more details). When a cohort was evaluated with the original and updated AUC, both cohorts were included in the meta-analysis, but in separate strata. However, for meta-regression models, only the cohort enrolled in the year closest to the publication date of the AUC was included.

Data Extraction
Using a standardized protocol and reporting form, data were extracted on the following characteristics: (1) identifying information (first author, journal, country, institution, publication year); (2) AUC used (stress echocardiography 2008 or 2011 AUC, stress MPI 2005 or 2009 AUC); (3) patient characteristics (mean age, percentage of male patients, percentage of patients with a history of diabetes, hypertension, hyperlipidemia, body mass index>30, myocardial infarction (MI), percutaneous transluminal coronary angioplasty (PTCA), or coronary artery bypass grafting (CABG); (4) stress test characteristics (test used, type of stressor); (5) appropriateness patterns, including appropriateness stratified by physician specialty; and (6) indications for inappropriate/rarely appropriate testing. We recalculated appropriateness rates when authors excluded patients whose studies were unclassified, but did not include papers that did not report the number of patients who were unclassified. Disagreements between reviewers were resolved through discussion.

Statistical Analysis
The primary outcomes were the proportions of appropriate, inappropriate/rarely appropriate, uncertain/may be appropriate, and unclassified cardiac imaging stress tests. Patient characteristics were summarized after weighting by each study's sample size. When studies reported that no patients were categorized as unclassified, a 0.5 correction factor was added to that outcome to facilitate calculation of a rate and standard error. Appropriateness estimates were pooled using the DerSimonian-Laird random-effects model to account for between-study heterogeneity attributable to differences in patient populations and clinician practice patterns. Statistical heterogeneity was also assessed with the Cochran Q statistic (a weighted sum of squared    differences between studies with a χ2 distribution) and I 2 statistic, which is derived from the Q statistic ([Q − df/Q] x 100) and estimates the proportion of overall variation attributable to between-study heterogeneity rather than chance. Because rates of uncertain/may be appropriate and unclassified patients tended to be low, we log transformed these values to more accurately estimate their standard errors and confidence intervals. To assess for publication bias, we constructed funnel plots (standard error versus appropriateness rates) stratified by AUC and performed the Egger test when at least 10 studies were present. None of these plots or statistical tests raised concerns for publication bias.

Meta-regression for temporal trends and effects of AUC updates
We performed meta-regression to assess temporal trends in appropriate and inappropriate/ rarely appropriate cardiac stress testing. Meta-regression in this context is limited by the possibility of ecological bias (sometimes referred to as "aggregation bias" or "ecological confounding"), [12] since appropriateness rates in different cohorts over time may not reflect overall trends in appropriateness. We hypothesized that academic setting, prevalence of risk factors for ischemic heart disease (gender, age, comorbidities), and physician specialty would influence rates of appropriateness. However, because many studies reported only a few risk factors, we limited our patient covariates to gender and age, so as not to significantly reduce sample size for these regression models. [12] Separate models pooled all stress echocardiography or MPI studies, and we included an indicator for the specific AUC used. The key variable in these models was time, as captured by the midpoint of the enrollment period. To avoid double-counting, when the same stress echocardiography or MPI cohort was evaluated with original and updated AUC, we used the AUC whose publication date was closest to the enrollment dates.
We also used meta-regression to examine whether updated stress echocardiography and MPI AUC were associated with a reduction in unclassified patients, and to test whether cardiac specialists (cardiologists and cardiac surgeons) and internists had different rates of appropriate and inappropriate cardiac stress testing. Most studies reporting specialty appropriateness categorized physicians as cardiac specialists or non-cardiac specialists, but we considered the latter to be internists based on national referral patterns.
In a sensitivity analysis, we attempted to analyze trends in appropriateness within the same institution, but these meta-regression models were not estimable due to limited sample size. However, we provide raw appropriateness rates from these studies: for stress echocardiogra-
A test for differences in unclassified rates demonstrated that the updated stress echocardiography AUC in 2011 was associated with a significant reduction in the proportion of these tests (relative reduction = 79%, p = 0.04). There was no evidence of a reduction in unclassified studies after the updated stress MPI criteria were released in 2009 (relative reduction = 64%, p = 0.25).

Discussion
By systematically reviewing studies of cardiac stress testing AUC, we found that rates of appropriate use tended to be lower for stress echocardiography compared to stress MPI, and that rates of inappropriate/rarely appropriate use tended to be higher. In the patient recruitment years of 2005 to 2014, we also found that rates of appropriate testing did not change significantly for stress echocardiography or MPI. Importantly, we showed that rates of unclassified stress echocardiograms fell after release of the 2011 AUC, whereas no significant changes were identified after updated stress MPI AUC were released. We did not find differences in appropriateness between physician specialties, though these analyses were substantially limited by sparse reporting. Finally, we found significant variability in indications for inappropriate/rarely appropriate cardiac stress tests, with preoperative testing and testing of low-risk symptomatic or asymptomatic patients representing leading indications.
Our study demonstrates that early efforts of the ACC's Appropriateness Criteria Working Group have had durable and far-reaching consequences on the trajectory of academic inquiry into appropriate testing, with more than 41 diverse cohorts evaluated since publication of the original 2005 AUC. These evaluations have also extended into the community setting, though academic medical centers remain the dominant site for AUC evaluation. While the rapid growth in cardiac imaging that spurred initial efforts to develop AUC may be slowing-at least for stress MPI-the total number of cardiac stress test referrals in US ambulatory settings has not changed in recent years, and expenditures on inappropriate tests remain substantial. [1]  The main findings of our study are similar to those from a recently published meta-analysis [10]   hypertension, smoking, coronary artery disease/myocardial infarction, and obesity; provide information about indications for inappropriate/rarely appropriate testing, which was absent in Fonseca et al; perform more robust analyses of appropriateness by physician specialty (we use both meta-regression and meta-analysis to compare cardiac specialists and internists); and apply a more rigorous method for evaluating temporal trends (we pooled more studies and adjusted for AUC version, whereas Fonseca et al estimated separate models [and therefore had smaller sample sizes] for each AUC version). In the context of AUC design, our study suggests that the potential effects of AUC are unclear, but these analyses are limited by the absence of a control group, and they are vulnerable to ecological bias. [12] We found no conclusive evidence of a trend over time in appropriate or inappropriate/rarely appropriate stress echocardiograms or MPIs. These results are in agreement with the work of Fonseca et al, [10] though our study samples differed (we captured more recently published studies), we used enrollment year instead of publication year as our measure of time, and we included an indicator in our metaregression models for the AUC version used rather than separately treating studies that used different AUC. Our findings are also similar to the results of another recent meta-analysis that focused on stress MPI. [47] It is important to note, however, that there is substantial uncertainty about the extent to which findings within different cohorts in our meta-analysis reflect general trends. [12] Further, we did not account for geographic variation in appropriate and inappropriate use of cardiac imaging, which may be an important source of confounding.
Notably, a greater number of stress MPI publications reported the results of quality improvement initiatives, such as one study that reported the effects of FOCUS (Formation of Optimal Cardiovascular Utilization Strategies), a Web-based community and quality improvement tool. [43] We hypothesized that higher expenditures on stress MPI and expansion of administrative controls such as prior authorization requirements, in combination with widening public concerns about radiation exposure, may have engendered a climate of urgency in the context of stress MPI. It is also possible that these factors may have had the unintended consequence of causing a shift in ordering practices to stress echocardiography, in order to avoid stress MPI in questionable scenarios or other scenarios. Nonetheless, more concerted efforts to increase appropriate use of stress echocardiography and MPI and reduce inappropriate/rarely appropriate use are needed.
Our findings have important implications for insurers and policymakers. A substantial proportion of cardiac imaging stress tests remain inappropriate/rarely appropriate, and our pooled estimates-based largely on studies from academic medical centers-may underestimate the inappropriate/rarely appropriate use of this technology and overestimate its appropriate use in the community. Notably, some studies, such as Doukky et al, focused on patients undergoing testing in a community setting. [35] These inappropriate/rarely appropriate tests increase healthcare expenditures and are less likely to yield positive findings or improve patients' health outcomes. It is important to recognize that a goal of zero inappropriate/rarely appropriate use is not only unrealistic but undesirable, as each patient represents unique considerations. While the optimal proportion is unknown, it is likely in the range of 10%, though no formal benchmarks have been proposed. Related to this, the small proportion of unclassified studies and relatively modest proportion of studies with uncertain appropriateness suggest that AUC may be an effective tool for evaluating the value of cardiac imaging stress tests, independent of prior authorization mechanisms and radiology benefits managers. Thus, wider incorporation and application of AUC, particularly in integrated health systems and accountable care organizations, could reduce the need for these alternate methods for constraining unnecessary utilization.
Introduction of the 2013 multimodality AUC adds calcium scoring and nonimaging exercise testing to the cohort of technologies subject to appropriate use review. We attempted to integrate the multimodality AUC into our meta-analysis but no studies rigorously implementing it were available at the time of our literature search. However, we did adopt the terminology of the multimodality AUC (e.g., "rarely appropriate" instead of "inappropriate") to more closely align our results with current interpretation of appropriateness. Assessing its effects, particularly in cohorts that have previously been evaluated with earlier AUC versions, will provide important insights into the overall effect of multimodality criteria, with possible implications for insurers and policymakers. In the Prospective Multicenter Imaging Study for Evaluation of Chest Pain (PROMISE) trial, [6] all patients had chest pain, shortness of breath, or other symptoms as well as cardiovascular risk factors, and therefore would be considered appropriate candidates for cardiac imaging stress tests by these criteria. However, the routine performance of cardiac stress testing in patients without symptoms remains an important issue. [1] Our study has several limitations. The majority of AUC evaluations were set in academic medical centers, where clinicians often care for higher-risk patients, may be more aware of AUC, and typically face weaker financial incentives to perform cardiac imaging stress tests. Because of small sample size and sparse reporting, we were unable to include a robust set of covariates in our examination of temporal trends. Moreover, meta-regression has significant limitations, including ecological bias (sometimes referred to as "aggregation bias" or "ecological confounding"), [12] and confounding from omitted variables (such as geographic variation in appropriate and inappropriate use of cardiac imaging, and clinical differences in the patient populations referred for stress echocardiography versus MPI), a risk shared by other analytic models of non-randomized, observational data. Further, the absence of a control group in our study attenuated our ability to causally link temporal changes in appropriateness to AUC development. Other ecological factors, including diffusion of radiology benefit managers and prior authorization programs, reductions in Medicare reimbursement, and the Choosing Wisely campaign, may also have contributed. In addition, application of AUC was not standardized across studies, so use of different methodologies could lead to different conclusions about appropriateness.
Recent AUC versions perform well for definitively categorizing the vast majority of stress echocardiograms and MPIs, but we found no conclusive evidence that diffusion of AUC increased the appropriate use of stress echocardiography or MPI. Overall rates of inappropriate/rarely appropriate testing are relatively low in academic settings, and integration of appropriateness guidelines in both academic and community settings may be an effective approach to further optimizing the inappropriate/rarely appropriate use of cardiac stress testing and its associated costs and patient harms.