Disutility associated with cancer screening programs: A systematic review

Objectives Disutility allows to identify how much population values intervention-related harms contributing to knowledge on the benefits/harms ratio of cancer screening programs. This systematic review evaluates disutility related to cancer screening applying a utility theory framework. Methods Using a predefined protocol, Embase, Medline Ovid, Web of Science, Cochrane, Google scholar and supplementary sources were systematically searched. The framework grouped disutilities associated with breast, cervical, lung, colorectal, and prostate cancer screening programs into the screening, diagnostic work up, and treatment phases. We assessed the quality of included studies according to the relevance to target population, risk of bias, appropriateness of measure and the time frame. Results Out of 2840 hits, we included 38 studies, of which 27 measured (and others estimated) disutilities. Around 70% of studies had medium to high-level quality. Measured disutilities and Quality Adjusted Life Years loss were 0–0.03 and 0–0.0013 respectively in screening phases. Both disutilities and Quality Adjusted Life Years loss had similar ranges in diagnostic work up (0–0.26), and treatment (0.09–0.27) phases. We found no measured disutilities available for lung cancer screening and—little evidence for disutilities in treatment phase. Almost 40% of the estimated disutility values were above the range of measured ones. Conclusions Cancer screening programs led to low disutities related to screening phase, and low to moderate disutilities related to diagnostic work up and treatment phases. These disutility values varied by the measurement instrument applied, and were higher in studies with lower quality. The estimated disutility values comparing to the measured ones tended to overestimate the harms.


Introduction
Cancer is one of the most wide-spread chronic diseases with an estimated 18.1 million of new cases in 2018 leading to 9.6 million deaths worldwide. [1] Among all malignancies, the diseases with the highest five-year prevalence in 2017 were breast (19%), prostate (12%), colorectal (11%), lung (5.8%), and cervical (4.8%) cancers. [2] Cancer screening programs can help to detect the disease before symptoms appear. Empirical studies show the benefits of cancer screening in decreasing the mortality of the most prevalent cancers. For example, the US Preventive Services Task Force meta-analyses showed 15%-20% reduction in breast cancer mortality with mammography screening and 20-60% reduction in cervical cancer mortality with cytology-based screening. [3,4] The International Agency for Research on Cancer reported 18-31% reduction of colorectal cancer mortality due to sigmoidoscopy screening, [5] while the National Cancer Institute reported a 20% reduction in lung cancer mortality among smokers with low dose computed tomography screening. [6] While some of the cancer screening programs (for breast, cervical, and colon cancers) are widely implemented, there are increasing concerns on possible harms of screening. [5,7] These harms mainly include anxiety, procedural risks, false positive diagnosis, and overdiagnosis (diagnosing cancers that would never have caused any symptoms). [8][9][10][11] Prevention strategies must be first of all safe, and so governmental bodies pay high attention to assessments of possible screening-related harms which could lead to retreat or delay of cancer screening programs. [12][13][14] Harms can be either assessed from clinical endpoints or represented by patients' values for the outcomes. Preferences of population for screening programs may be expressed in utility values, while screening-related harms (or loss in health-related quality of life) illustrated in disutility values. [15] Methods of deriving health state utility values (HSUVs) include direct and indirect methods. The examples of direct approaches include Time Trade Off (TTO), Standard Gamble (SG), Visual Analog Scale (VAS), and Discrete Choice Experiment (DCE). Among the indirect instruments are EuroQol 5 Dimensions (EQ-5D), Short Form 6 Dimension (SF-6D), Rand-36, and Health Utilities Index (HUI). [16] These methods are rooted in utility theories that reflect the consumer satisfaction over the choices. [17,18] Utilities as measures of patient preferences are widely considered in health decision making, as a component of quality adjusted life years (QALYs) in cost-effectiveness analysis. The QALYs are calculated as HSUVs multiplied by time spent in certain health state (called time frame). Theoretically the disutility value equals to "1-utility", so the larger disutility related to screening then the lower total utility for the screened population. QALYs losses (disutility value multiplied by time frame) express the general harms of the screening program.
Knowledge on screening-related disutilities is a crucial component in understanding of benefits/harms ratio of cancer screenings. Meanwhile, no systematic review summarized this evidence so far. [19][20][21] Our study aims to fill in this gap by identifying typologies of disutilities and further evaluating the reliability and variability in disutility values.

Search and selection
We systematically searched Embase, Medline Ovid, Web of Science, Cochrane, and Google scholar from their commencements till April 2018. The search syntax (S1 File) was developed with an input from a qualified librarian. We also searched non-systematically the other supplementary sources (S2 File) and references of the included studies.
One researcher (LL) screened and included all abstracts focused on lung, breast, colorectal, cervical or prostate cancers reporting the results of studies of various designs (models, randomized controlled trials, cohort or case-controlled studies, and systematic reviews). We excluded studies that were: (1) related to other diseases; (2) reporting clinical utility/practice (for example, screening methodology, compliance, clinical diagnosis or treatment); or (3) not full-text papers (meeting proceedings, posters or commentaries). All full texts of included abstracts were double screened (by LL and OM) excluding studies that did not report disutility values or reported disutility values cited from another source (in this case the original source was used). If the author used disutility from the literature but also applied certain assumptions for the value, and therefore its value differs from the cited value, then it was included as estimated value.
All the relevant information from the included studies was extracted by one author (LL) using a data extraction form, and verified by the second author (OM).

Theoretical framework of the review
Referring to the American College of Physicians' value framework for cancer screening, [12] we grouped the reported disutility into three typologies (Fig 1). a. Screening phase: the disutility is normally derived from the primary screening test because of the discomfort during the procedure and have a short-term effect, generally from a few days and up to 3 weeks.
b. Diagnostic work up phase: the disutility at this stage is not only caused by physical effects (such as discomfort or complication from follow up or repeated tests' procedure), but also by psychological effects (such as anxiety and emotional distress about unfavorable or indeterminate result). The time frame of this stage ranges from a few weeks to a few months. Disunities in this phase were divided into three groups: c. Treatment phase: The disutility in this phase is related to overtreatment of the screening detected (or overdiagnosed) cancer. The time frame generally ranges from several months to years.

Quality appraisal
We developed the quality appraisal criteria based on Ara et al.  (Table 1). For each of the criteria the studies were scored as 'good' (score 2), 'fair' (score 1) or 'poor' (score 0). The studies with an overall score of �7, 5-6, 3-4, < 3 were rated as high, medium, low, very low quality respectively. Quality of the included studies was assessed independently by two reviewers (LL, OM) with disagreements being solved by consensus.

Data synthesis
We reported the disutility values by typology and cancer type respectively. The study aimed to combine the disutility values in meta-analysis under conditions of sufficient number of values Table 1. Checklist for quality appraisal.

Criteria Description
Relevance to the population's preference

Respondent selection and recruitment
Does this result in a population comparable to that being evaluated?
Inclusion/exclusion criteria Do the criteria exclude any individuals? (for example, the elderly >80-year-old are often not included in studies) Relevance of location Are the population recruited from multiple locations?

Quality assessment-Risk of bias
Sample size Is the sample size appropriate in reflection population's preference?
Response rates to the measure used Are the response rates reported? If so, are the rates likely to be a threat to the validity of the estimated health state utility values?
Loss to follow-up How large is the loss to follow-up and are the reasons for it given? Are these likely to threaten the validity of the estimates?
Missing data Are missing values well-reported and addressed? What are the levels of missing data and how are they dealt with? Could this threaten the validity of the estimates?

Appropriate use of instrument
For direct methods (DCE, TTO, SG, VAS): Is the method used appropriately? If anchors are used describing the perfect and worse health (for example anchored at 1 as equivalent to full health and 0 as equivalent to dead)?
For indirect method (EQ-5D, SF-6D, SF-36, HUI): Are the adequate details of the method provided (for example, the details given on the version used, the social tariff applied, etc.)?

Time frame
Is the time frame specified? If so, is it sufficient or reliable to account for the magnitude of harm from screening (when relevant)? Time frame preferences: Measurement > guideline recommendation or assumption with justification (example, referring to a local clinical practice, or using the time frame from literature reviews) > assumption without justifications or no time frame reported (this criterion was considered as not applicable for DCE studies) Notes (at least ten studies to each covariate [22]) and manageable heterogeneity in methods and outcomes. Considering that these conditions could not be reached, the qualitative synthesis was applied. We summarized mean and confidence interval disutility values by typology and cancer types; for studies of high and medium quality, we calculated non-reported confidence intervals when standard deviation was available.

Studies selection and overview
Out of 2840 abstracts identified by databases search and from the other sources, 23 studies met the eligibility criteria set by this review. Through checking the reference list of the articles, another 15 studies were included, resulting to 38 papers included in total (Fig 2). The level of agreement between two reviewers was high (kappa coefficient = 0.99).
A summary of characteristics of the included studies is provided in the Table 2. About 30% of the studies reported estimated disutility values, while remaining 27 studies evaluated disutilites by direct or indirect instruments. The data extraction form reporting key characteristics of individual studies is presented in the S1 and S2 Tables.
Around 70% of studies which measured disutility values were rated as medium or high level of quality (S3 Table). The studies were ranked lower on risk of bias and time frame than the other quality criteria (Fig 3).

Result on measured disutility
Disutility in screening phase. Eight studies reported disutility related to cervical, breast and prostate cancer screenings (Fig 4). The disutility associated with cervical cancer screening ranged 0-0.02 [24,25], and QALY lost 0-0.0006 with 1 to 2 week timeframe. Disutilities related to breast cancer screening measured with VAS varied-considerably (0.006-0.2). [26][27][28] Disutility related to prostate cancer screening ranged 0-0.03 and calculated maximum QALYs loss around 0.0013, with most studies concluding on no disutility from screening attendance. [29][30][31] In a nutshell, taken the evidence from medium to high quality studies, the disutility values due to primary screening attendance were around 0-0.03, and the corresponding QALYs loss around 0-0.0013.  due to false positive result in breast cancer screening. [35] In summary, evidence from above studies showed that the false positive's disutility values and calculated QALYs loss were in the range of 0-0.26. Procedure-wise disutilities. Eleven studies reported the measured disutility values due to screening procedures for breast, prostate and cervical cancers (Fig 6).
Three studies on breast cancer reported procedure-wise disutilities in the range 0-0.45. [26,28,36] No disurilitues found for prostate cancer because of screening procedure. [29,31] For cervical cancer, one study tested the procedure-wise disutility related to the repeated pap smear and colposcopy referral; the disutility values ranged 0-0.03 and the calculated QALYs  loss 0-0.0375. [24] Another five studies investigated the differences in disutility of aggressive versus conservation protocols for patients with abnormal primary cervical cancer screening results. [37][38][39][40][41] The conclusions were contradictory whether early colposcopy leads to loss [38,39] or gain in utilities. [40] Two studies concluded on disutilities of either immediate human papilloma virus (HPV) test [37] or immediate treatment and cytological surveillance versus conservative protocols. [41] Two DCE studies in colorectal cancer screening demonstrated the disutility of unnecessary colonoscopy and non-accurate or low-sensitivity tests from general population perspectives, [42,43] while Marshell et al (2009) found no disutility of related to colonoscopy usage from physician's preferences. [43] In general, leveraging the evidence from medium to high quality studies, the procedure wise disutility were 0-0.03, and the overall QALYs losses were in the range of 0-0.0375.
Abnormal result related disutility. Seven studies reported substantial variability in disutility values (the lowest of 0.004 for HPV positive and the highest of 0.4 for cervical intraepithelial neoplasia [CIN] II-III) and time frames (from 3 to 18 months) because of abnormal results. (Fig 7) [25, [44][45][46][47][48][49] Disutility in treatment phase. The only study reporting disutility in treatment phase, by Cantor S.B. et al (2008) investigated the couples' preference for prostate cancer screening outcomes. Disutility values of 0.09-0.27 were reported because of possible side effects (such as impotence, urinary incontinence, and injury) from the screening and consequent treatments. [50] Summary on measured disutilities by cancer types. Most included studies were on cervical and breast cancers, while no single study reported the disutility of lung cancer screening. Similarly, only one study assessed disutility related to treatment phase-overtreatment of prostate cancer (Table 3).
Disutility values by studies' quality and instrument used. Disutility values varied by quality of the studies, with values from high-quality studies being generally lower than from medium and low quality studies (Figs 4-6). At the same time, elicitated disutility values from indirect method were lower than those values from direct methods in screening-phase, false positive and procedure-wised disutility (Figs 4-6). Result on estimated disutility. Out of eleven studies that reported the estimated disutility values, almost 40% assumed values outside of the measured range (Table 4).

Discussion
Our systematic review identified screening, diagnostic work up and treatment phases as three typologies of disutilities in cancer screening programs. Among these typologies the diagnostic work-up phase and treatment phases are potentially more important taking into account the impact on quality of life in terms of a degree of perceived screening-related harms and its time frame. Considering the analyzed literature on cervical, breast, and prostate cancer, we assume a low level of harms (less than 0.03 resulting to 0-0.0013 QALY loss) related to disutility from primary screening and low to moderate level of harms (0-0.26 range for both disutility and QALY loss) related to diagnostic work up from population perspective. Although women with false positive diagnosis considered the risk of having it as acceptable, [62] disutilities and QALYs loss related to false positive rate should not be ignored because of its commonality in clinical practice (for example, 1-11% in screening mammography [63][64][65]). Although this review identified only one study reporting disutility for treatment phase (0.09-0.27), taken into account the longer timeframe for disutilities related to overdiagnosis, we assume a moderate level of harms perceived in treatment phase. Our review identified the studies both measuring and estimating disutilities related to cancer screening. An important finding of our review is that when disutilities are based on assumption, investigators tend to overestimate the harms; these methodological risks should be considered in cost-effectiveness analyses of cancer screening interventions.
Another important outcome of our review is application a novel framework to assess quality of studies reporting utility values. Quality of studies on HSUVs in cancer realm is rarely evaluated and so methodological improvements on this regard are important. One of the few evaluable estimates is a systematic review of Carter et al (2015), who qualitatively evaluated the quality of upper digestive tract cancer studies. [66] In our review, about 70% of studies on five cancers were ranked as medium / high quality. Meanwhile, the studies had important limitation in reporting uncertainty in their findings with only two of them including confidence intervals. For seven studies it was possible to derive the confidence intervals based on the data  reported. While this information did not change the conclusions of the review, it undervalues even more the importance of disutilities in screening and diagnostic work up phase, with negative confidence intervals received in three high and medium quality studies. An interesting observation, is that the higher quality the studies was rated, the lower disutility values were reported.

Methodological considerations on disutility measurement
Variations in utility elicitation are strongly associated with the instruments used in the study. [15,19,67] We found that the indirect methods tended to retrieve lower disutility value than the direct methods, in most cases showing no at all. The published literature reported that utility values were generally higher with TTO than with SG, and generally lower with VAS/Rating Scale (RS). [68][69][70] In our review due to limited values retrieved, the head-to-head comparison among TTO, SG and VAS was not feasible. Meanwhile, we observed a trend of higher disutility values from TTO than VAS/RS. Considering stated, synthesis and interpretation of disutilities related to cancer screening programs should take into consideration the evaluation instruments used and other methodological differences among the studies. With regard to the DCE, because of the methodological differences, we could not compare the retrieved disutility values with other evaluation approaches. Despite a few disadvantages (potential underlying mismatch with random utility theory, irrational respond issue and difficulty of incorporating QALYs values), DCE has multiple benefits such as a trade-off between options, less cognitive burden, easier administration, and less measurement error. [16,71] Stolk et al (2010) proposed a hybrid of TTO and DCE, [72] which could maximize the advantages of both methods enabling the precision utility elicitation. We believe this might be a promising strategy to follow in future studies assessing disutility of cancer screening programs.

Impact of the research findings
Our findings suggest that disutilities related to cancer screening are mainly related to diagnostic work up though these results are uncertain because of either not reported or wide confidence intervals. The disutilities related to treatment phase are not explored. While on population scale the screening phase is the most important for disutilities assessment, since it affects each screened individual, all high quality studies report zero disutility on this stage (four studies report zero and one includes zero into the range of values). If high quality studies are used as a reference point, economic evaluations reporting estimated disutilities relevant to screening stage overestimate their values. This will lead to overestimation in cost-effectiveness ratio of cancer screening programs. Besides, when applying probabilistic sensitivity analyses where utilities and disutilities are assumed to be independent, this assumption will increase the uncertainty regarding incremental cost-effectiveness estimates.

Limitations
This review is subject to several limitations. First of all, the applied quality criteria need further validation on the other studies. In addition, given very limited data retrieved per each typology and heterogeneity of the results, meta-synthesis was not feasible. [73] Considering incomparability between DCE and other direct or indirect methods, we could not incorporate these values into qualitative synthesis. Lastly, our inclusion criteria were limited to English-language articles only, which may not identify all the relevant studies.

Research gap
To conclude, further research is needed in the area of disutility assessment. From all the typologies, the priority should be targeted at the potential moderate level of harms (false-positive diagnosis and overtreatment). More studies are necessary to assess disutility related to colorectal, lung and prostate cancer screening. Furthermore, our review identified that around 60% of authors estimated the time frame for certain health state in their utility studies; therefore, we call for the urgent needs to standardize the time frame reporting. Lastly, given the advantage of allowing trade-off between options of DCE method, we think it is valuable to introduce more DCE studies in cancer screening programs. Such approach will help to improve the evidence for cost utility analysis and to facilitate further the sound decision making process for cancer screening programs.

Conclusion
Cancer screening programs lead to low disutities related to screening phase, and low to moderate disutilities related to diagnostic work up and treatment phases. These disutility values varied by the measurement instrument applied and study quality.