How Methodologic Differences Affect Results of Economic Analyses: A Systematic Review of Interferon Gamma Release Assays for the Diagnosis of LTBI

Introduction Cost effectiveness analyses (CEA) can provide useful information on how to invest limited funds, however they are less useful if different analysis of the same intervention provide unclear or contradictory results. The objective of our study was to conduct a systematic review of methodologic aspects of CEA that evaluate Interferon Gamma Release Assays (IGRA) for the detection of Latent Tuberculosis Infection (LTBI), in order to understand how differences affect study results. Methods A systematic review of studies was conducted with particular focus on study quality and the variability in inputs used in models used to assess cost-effectiveness. A common decision analysis model of the IGRA versus Tuberculin Skin Test (TST) screening strategy was developed and used to quantify the impact on predicted results of observed differences of model inputs taken from the studies identified. Results Thirteen studies were ultimately included in the review. Several specific methodologic issues were identified across studies, including how study inputs were selected, inconsistencies in the costing approach, the utility of the QALY (Quality Adjusted Life Year) as the effectiveness outcome, and how authors choose to present and interpret study results. When the IGRA versus TST test strategies were compared using our common decision analysis model predicted effectiveness largely overlapped. Implications Many methodologic issues that contribute to inconsistent results and reduced study quality were identified in studies that assessed the cost-effectiveness of the IGRA test. More specific and relevant guidelines are needed in order to help authors standardize modelling approaches, inputs, assumptions and how results are presented and interpreted.


Introduction
Global tuberculosis (TB) control is currently facing great opportunities, but also great challenges. Opportunities for improved TB control have increased dramatically over the past decade as the result of greater funding from governments of low and middle income countries (LMICs) and from international donors and funding agencies [1]. At the same time, the number of new tools, particularly in the area of TB diagnostics, has expanded rapidly, providing a wide array of potential technologies for implementation [2]. One of the greatest challenges for governments and donor agencies is to decide where to invest resources to achieve the greatest benefit for the most people.
Economic analyses can provide decision makers with more information on which to base investment decisions, by comparing costs and resulting health benefits of different approaches. Cost Effectiveness Analyses (CEA) are one of the most commonly used economic analyses in published studies [3]. The cost per unit of outcome or health effect of different interventions can be estimated and compared [3]. If CEAs are conducted with rigorous, standardized and transparent methods, results of different analyses should be comparable and help policy makers reach consensus on interventions to be implemented in a particular population or setting [4]. However, if different analyses of the same intervention produce contradictory results, this may heighten confusion and even discredit the value of these analyses.
The area of diagnostics for latent TB infection (LTBI) serves as an excellent example of this phenomenon. Until relatively recently, a single test -the Tuberculin Skin Test (TST) -was the only method to diagnose LTBI. In the past decade, Interferon Gamma Release Assays (IGRAs) have been approved for use for this purpose in many countries, leading to a wave of studies of their accuracy and utility [5,6]. These have included costeffectiveness analyses, which have provided seemingly contradictory messages. In general, systematic reviews are designed to synthesize evidence after careful assessment of the methodological quality of all available relevant studies on a particular topic [4]. For economic analyses in particular, the goal of a systematic review is not to produce statements about whether a particular intervention is cost effective, but rather to summarize what is known from different settings about economic aspects of interventions, as well as to encourage a more transparent and consistent approach to the conduct and reporting of economic analyses [4]. The objective of our study was thus to conduct a systematic review of methodologic aspects (study quality, inputs and methodologic approach) of CEA that evaluate IGRA's for the detection of LTBI, in order to assess if methodologic differences could account for differences in study findings and conclusions. A second objective was to develop a common decision analysis model that could quantify the impact on predicted costs and effectiveness of the observed differences in inputs that were used in the studies identified.

Ethics Statement
An ethics statement was not required for this work.

Systematic Review
Search criteria. We searched for CEA that compared IGRA's with at least one other test strategy for diagnosing LTBI.
Included studies used modeling techniques to make predictions about specific outcomes over time with any analytic horizon. No limits on year of publication, or language were imposed. Predicted outcomes of interest included Quality Adjusted Life Years (QALYs), active TB cases and total costs predicted. Studies were excluded if they: 1) used animal subjects; 2) assessed detection of active disease; 3) were conference abstracts or proceedings; 4) assessed detection of non-tuberculous mycobacterial infection or disease; and 5) used non-standard tests for LTBI.
Search methods. We searched the following databases from 1947 up to March 15 th 2011: Scopus, Web of Science, Medline, Embase, Cinhal, Cochrane Library, CRD, Econlit, CEA registry and Lilacs for relevant studies. An update was performed on August 31 2011. In addition to these databases, reference lists of identified publications were also hand searched. A sample search string used for a Medline database search can be found in Table  S1.
Study selection. Two independent reviewers reviewed all titles and abstracts in order to select full studies. Full text review to finalize study selection was done independently by the same two reviewers and any disagreements were resolved by a third reviewer.
Data abstraction. A standardized data abstraction form was developed and piloted on a subset of studies. Once finalized, the form was used by two reviewers to independently extract data. Data was extracted on the following topics; 1) General information, 2) Model/Economic inputs and assumptions, 3) Study input data sources, 4) Predicted outcomes and 5) Study quality (using the Drummond Checklist-see section below for more detail). More detail on the types of data abstracted by study can be found in Table S2. Data from both reviewers was compared to ensure accuracy of data abstraction. Any differences between reviewers were resolved by discussion with a third reviewer. Authors were contacted for clarification or if key information was missing.
Assessment of study quality. For each study, the overall methodological quality of each study was evaluated using the Drummond et al. 35 item checklist [7]. Each individual item was scored using the mutually exclusive categories ''Yes'', ''No'', ''Not clear'' or ''Not applicable''. A detailed qualitative comparison of the results provided in the text and abstract conclusion was also conducted.

Summarizing Variability in Study Inputs and Predicted Results
For each study the following model inputs were abstracted: test characteristics, transitional probabilities (eg risk of disease if infected), and costs -particularly the specific components of the cost for IGRA and TST. Predicted outcomes abstracted included: cost per person screened and effectiveness measures (QALYs or active cases) by test scenario. All costs were converted to US dollars   Assessment of Impact of Variability in Study Inputs using a Common Model Decision analysis model. We developed a common decision analysis Markov model using TreeAge software (TreeAge, Version 2011) that incorporated the basic structure and consequences of all of the models used in the studies included in the review. As shown in Figure S1, the model simulates two identical population cohorts, some of whom are infected with TB. In the first year of the simulation, the first cohort is tested with an IGRA test, while the second is tested with a TST. Depending on the underlying TB health state of the population, and the characteristics of the test being used, the population falls into one of four mutually exclusive states (true positive, false positive, true negative or false negative). Depending on the state, various consequences ensue. For example, for those who are test positive, some of the population may adhere to treatment and complete an effective course of treatment, resulting in no negative outcome. Noncompletion and/or ineffective therapy can also occur however, which results in the development of active TB-a negative predicted outcome. Adverse events can also occur to anyone who is treated, regardless of underlying TB health state. Once the cohort completes the screening and treatment process in the first year of the model, those that are infected and remain with LTBI will cycle into an ''infected state'' in the next year of the model and may later reactivate and develop active disease. Those who cure after prophylactic therapy, or reactivate to active disease do not continue to cycle in subsequent years.
Assessing impact of input variability on predicted results. In this common model all pathogenetic and cost inputs were defined using the distribution of input values used in the different studies included in the review. These cost and input values that affect effectiveness are summarized in Tables 1 and 2. Monte Carlo probabilistic sensitivity analysis was used over 10,000 iterations to define the distribution of outcomes (costs and effectiveness) for each test scenario. Effectiveness was defined as the probability of being an active case. Percentiles (2.5 th and 97.5 th ) were calculated for each distribution and predicted results were plotted in order to visually compare results.
One way sensitivity analysis was then conducted on each variable included in this common decision analysis model to quantify the relative impact on predicted total costs over a 20 year analytic horizon of differences in the study inputs. For this analysis, the input range for each variable was taken from the maximum and minimum of values used in the different studies included in the review. The spread (calculated as the difference between the lowest predicted outcome value, and the highest expected value) and potential influence (calculated as the Spread divided by the mean expected value for each test scenario) of each variable was calculated.

Studies Included in Review
As shown in Figure 1, the initial search found 714 unique references. After review of titles, abstracts and full text, 11 studies met the inclusion criteria and were included in the review. Two additional studies published after the initial review was conducted were added when a search update was performed. A summary of the 13 studies included in the review is provided in Table 3 [13][14][15][16][17][18][19][20][21][22][23][24][25]. Studies mostly considered populations in high income countries. A variety of study sub-populations were considered including contacts, immigrants, health care workers.

Study Quality
For all studies included in the review, the average proportion of ''Yes'' values given on the quality checklist was 72%. The breakdown of Yes or No/Not clear for each of the 35 checklist items is summarized in Figure S2 Table S3). The result provided in the text and abstract conclusions on cost effectiveness were totally consistent in only 5 studies.

Variability of Inputs
Key epidemiologic model inputs reported in studies varied extensively, as shown in Table 1. More detail on these inputs is provided in Table S4. Even after adjustment for inflation and currency, cost inputs included in studies also varied widely ( Table 2). For example, the test cost for TST varied from $17 to $121 (2011 USD) and for IGRA, from $21 to $219 (2011 USD). An in depth examination of costing components of these parameters (Table S5) showed that the approaches to costing were different in all studies. For example, 6 studies included costs from the patient perspective, 9 explicitly stated that they included ''indirect'' or costs for medical staff time to conduct the tests, and 7 explicitly stated that blood draw/phlebotomy costs associated with the IGRA were included.

Variability of Predicted Results
In all studies, predicted effectiveness measures (QALYs gained or active cases prevented) were almost identical with all test scenarios (Table 4). QALYs gained from use of IGRA relative to use of TST are also shown as days of life gained, to emphasize the very small differences in effectiveness. Of all studies that compared effectiveness with use of IGRA versus TST, only one study predicted a gain of more than 1 day with use of the IGRA over an analytic horizon of 20 years of more. On the other hand, predicted cost differences between use of IGRA and TST varied widely between studies, and between sub-populations considered within the same study ( Table 5).

Assessment of Impact of Variability of Study Inputs using a Common Model
The distribution of predicted effectiveness ( Figure 2a) and costs (Figure 2b) largely overlapped when these outcomes were predicted from Monte Carlo simulations using our common decision analysis model. The 2.5 th and 97.5 th percentiles for predicted effectiveness (probability of being an active case) were very similar at 0.246% and 4.07% for the TST strategy and 0.244% and 3.98% for the IGRA strategy. Predicted costs showed more of a difference between strategies with the 2.5 th and 97.5 th percentiles for the TST strategy at $398 and $2251 and for the IGRA strategy at $279 and $1953.
Using the same model, when the model inputs were varied in one way sensitivity analyses, the predicted spread of costs was large ( Table 6). For both strategies the parameter with the greatest spread, and thus the greatest potential influence in the model, was the ''prevalence of LTBI'' (Potential influence: 147% and 97% for  IGRA and TST respectively). The reactivation rate in the absence of effective LTBI therapy was also important (Potential Influence: 115% and 92% for IGRA and TST respectively). The cost of treating active TB and LTBI followed in the ranking of influential parameters. The full ranking of all parameters is reported in Table 6.

Discussion
Thirteen cost effectiveness papers were reviewed in our study. Differences in estimated effectiveness were consistently very small in all studies. Although in general quality was deemed to be satisfactory, assumed input costs and transitional probabilities were very inconsistent. As a result, predicted costs and cost-effectiveness varied widely. Although CEAs are supposed to provide objective evidence for decision making, when studies present widely discrepant results they are less useful. A lack of standardization and divergence in CEA methods led to the development of the recommendations set out by the Panel on Cost Effectiveness in Health and Medicine in 1996 [26]. Despite the existence of recommendations such as these, many issues still remain in how CEAs are conducted. Some problems appear to stem from how well authors can implement guidelines in practical terms. However, the appropriateness of guidelines for specific areas of evaluative research is also of some concern.
The detailed review of methods performed in this systematic review identified several specific methodologic issues relating to data analysis, presentation and interpretation of CEA findings. The implications of some of these issues are discussed in more detail below:

Selection of Study Inputs
The estimates of pathogenetic, cost inputs and test characteristics used in different studies varied widely. Even though these inputs played an important role in determining results, much of the variability in input values was not well justified. As highlighted in the Drummond et al [7] evaluative criteria of economic studies, whenever possible, inputs should be derived from systematic reviews and meta analyses.

Approach to Costing
The approach to costing, including which specific cost components were included, varied by study; this had an important impact on determining cost effectiveness. The Recommendations of the Panel on Cost effectiveness in Health and Medicine [26] for the ideal approach for costing should be followed whenever possible. However, authors are often faced with practical limitations, and in certain cases may have to prioritize using cost data that are easily obtainable.

Use of the Effectiveness Measure in Diagnostic Studies
The difference in effectiveness measure between test strategies was so small as to be clinically meaningless. Although cost effectiveness is determined by differences in effectiveness and differences in cost, the latter was identified as the main determinant of study results in this particular area. The QALY is recommended by The Panel on Cost effectiveness in Health and Medicine as the ideal measure of health effectiveness [26]. However this study demonstrates a weakness of using this measure for diagnostic studies. Given that none of the conventional measures of effectiveness were able to capture meaningful differences between the two testing strategies for the detection of latent TB infection, the focus of economic studies in this area should be placed on cost alone.

Presentation and Interpretation of Cost Effective Data
Issues were identified in the presentation and interpretation of data, with many studies not clearly presenting data on which test was ''the most cost effective''. Conclusions that a certain strategy was ''cost effective'' or ''highly cost effective'' were frequently not defined, or based on a willingness to pay threshold of $50,000 per QALY gained. This benchmark was developed for evaluation of cost-effectiveness of interventions for end stage renal disease in the US in the 1980s [27], so may not be appropriate for the testing scenarios, countries, or populations being considered.
Some of these findings are consistent with the assessment of conceptual issues related to modeling and economic analyses of TB diagnostics by Dowdy et al. [28]. Although they did not focus on IGRAs for diagnosing LTBI, they suggested that current approaches to economic analyses in diagnostic research need to be improved, particularly the need for better defined thresholds for cost effectiveness. Nienhaus et al. also recently performed a systematic review of TB screening strategies [29]. Unlike our study, their objective was to summarize the evidence in order to make a recommendation regarding a preferred strategy for LTBI screening. Although the authors acknowledged differences in input costs, model assumptions, strategies evaluated and outcomes, they still recommended a preferred test strategy, but cautioned that more evidence is needed for ''generally accepted inputs for economic analysis''.
Cost effectiveness analyses are an essential tool for the evaluation of health care practises, and are being used more and more widely to prioritize interventions. Our analysis highlights some of the specific methodological issues observed in published CEA of IGRA screening. Although guidelines exist to standardize such analyses, in many cases these guidelines were not followed, although some aspects may not be relevant for diagnostic CEA. More specific and relevant guidelines are needed, and we suggest the following: 1. The development of standard inputs and assumptions for use in modeling studies like those included in our review would be useful. Standard sources could then be routinely used as input data for modelling studies. 2. The standardization of approaches to costing should also be encouraged so that all studies include similar cost components-ideally from a societal perspective which includes the economic impact on patients in addition to the impact on the health system. 3. The choice of primary economic measure also needs to be considered carefully in these types of studies. Based on our finding of no substantial difference in effectiveness between testing strategies, for this question -of comparing diagnostic strategies in LTBIeconomic analyses should focus exclusively on cost and resource implications in the setting in question. 4. Finally, authors should make much greater effort to present and interpret cost effectiveness results in a more transparent manner. For example, standard criteria of willingness to pay must be used and the setting clearly stated when concluding if a study is ''cost-effective''. And, if the difference in effectiveness is very small this should be explicitly stated, and any conclusions about cost-effectiveness should be avoided. Ultimately, these recommendations should improve economic studies that evaluate diagnostic strategies for LTBI, and increase their value for informing individual and public health decisions.