Estimating the impact of differential adherence on the comparative effectiveness of stool-based colorectal cancer screening using the CRC-AIM microsimulation model

Background Real-world adherence to colorectal cancer (CRC) screening strategies is imperfect. The CRC-AIM microsimulation model was used to estimate the impact of imperfect adherence on the relative benefits and burdens of guideline-endorsed, stool-based screening strategies. Methods Predicted outcomes of multi-target stool DNA (mt-sDNA), fecal immunochemical tests (FIT), and high-sensitivity guaiac-based fecal occult blood tests (HSgFOBT) were simulated for 40-year-olds free of diagnosed CRC. For robustness, imperfect adherence was incorporated in multiple ways and with extensive sensitivity analysis. Analysis 1 assumed adherence from 0%-100%, in 10% increments. Analysis 2 longitudinally applied real-world first-round differential adherence rates (base-case imperfect rates = 40% annual FIT vs 34% annual HSgFOBT vs 70% triennial mt-sDNA). Analysis 3 randomly assigned individuals to receive 1, 5, or 9 lifetime (9 = 100% adherence) mt-sDNA tests and 1, 5, or 9 to 26 (26 = 100% adherence) FIT tests. Outcomes are reported per 1000 individuals compared with no screening. Results Each screening strategy decreased CRC incidence and mortality versus no screening. In individuals screened between ages 50–75 and adherence ranging from 10%a-100%, the life-years gained (LYG) for triennial mt-sDNA ranged from 133.1–300.0, for annual FIT from 96.3–318.1, and for annual HSgFOBT from 99.8–320.6. At base-case imperfect adherence rates, mt-sDNA resulted in 19.1% more LYG versus FIT, 25.4% more LYG versus HSgFOBT, and generally had preferable efficiency ratios while offering the most LYG. Completion of at least 21 FIT tests is needed to reach approximately the same LYG achieved with 9 mt-sDNA tests. Conclusions Adherence assumptions affect the conclusions of CRC screening microsimulations that are used to inform CRC screening guidelines. LYG from FIT and HSgFOBT are more sensitive to changes in adherence assumptions than mt-sDNA because they require more tests be completed for equivalent benefit. At imperfect adherence rates, mt-sDNA provides more LYG than FIT or HSgFOBT at an acceptable tradeoff in screening burden.


Introduction
Screening and subsequent treatment of pre-symptomatic neoplasia has been shown to reduce the incidence and mortality of colorectal cancer (CRC) [1,2]. Widely endorsed CRC screening modalities include structural examinations (e.g., colonoscopy, computed tomography colonography [CTC], and flexible sigmoidoscopy [SIG]) and stool-based tests (e.g., fecal immunochemical testing [FIT], high-sensitivity guaiac-based fecal occult blood tests [HSgFOBT], and multitarget stool DNA [mt-sDNA]). Ideally, randomized clinical trials (RCTs) would be conducted to evaluate the comparative effectiveness of different CRC screening strategies. However, obtaining long-term clinical trial data for all the various permutations of CRC screening strategies (i.e. screening modality, start age, stop age, and interval) is not feasible. Therefore, outcomes of microsimulation models have been used by clinical societies and expert panels (i.e., American Cancer Society [ACS] and US Preventive Services Task Force [USPSTF]), in conjunction with systematic evidence reviews, to determine appropriate CRC screening strategies [3,4]. Three CRC microsimulation models have been independently developed with funding from the Cancer Intervention and Surveillance Modeling Network (CISNET) Colorectal Working Group [5][6][7][8]. In 2016, CISNET published a set of model-recommendable screening strategies that were based on a combination of modeling results and consultation with USPSTF members [4].
The CISNET analyses used to inform USPSTF (and other) guidelines assume perfect (100%) adherence with all CRC screening, follow-up, and surveillance procedures over each individual's lifetime [4,9]. Therefore, the recommendations are based on what is theoretically, rather than practically, achievable. Although assuming perfect adherence to all screening strategies provides the opportunity for impartial comparison across the analyzed modalities, this assumption inherently limits real-world application of the estimated outcomes. Indeed, perfect adherence has been recognized by the CISNET group and others as a limitation to traditional simulation analyses [9][10][11]. Although currently sparse, real-world data referent to longitudinal CRC screening demonstrate variable participation rates both across and within populations, with none approaching the modeled assumption of 100% [12][13][14][15]. To more realistically assess the relative benefits and burdens of stool-based screening in microsimulation modeling, imperfect adherence needs to be taken into consideration.
A new CRC microsimulation model, Colorectal Cancer and Adenoma Incidence and Mortality Microsimulation Model (CRC-AIM), was developed based on previously reported parameters of CRC-SPIN [16,17]. Because of the substantial differences among screening modalities (e.g. bowel prep, invasiveness, time off work, adverse events), this analysis is limited to stool tests, which have similar burdens and no direct complications. The objective of the current analyses was to use CRC-AIM to estimate the impact of imperfect adherence on the relative benefits and burdens of guideline-endorsed stool-based screening strategies.

Microsimulation model
CRC-AIM natural history, which is based on a published methodology [18][19][20], is used to model the sequence of adenoma to carcinoma progression in unscreened patients, as briefly described herein. As individuals age, the risk of adenomas generally increases. Adenomas may grow and transition to preclinical cancer, which may transition to symptomatic CRC. CRC screening facilitates the removal of adenomas and potential early detection of preclinical CRC [4]. The ability of a stool-based CRC screening test to detect an adenoma or preclinical CRC is dependent on test performance (i.e., sensitivity) and test completion (i.e., adherence). Positive screening tests are followed-up with a diagnostic (i.e. follow-up) colonoscopy. Further description of the natural history and screening components of the CRC-AIM model is provided in the S1 Appendix. Regressions comparing CRC-AIM versus CISNET models for total colonoscopies, life-years gained (LYG), CRC incidence reduction and CRC mortality reduction for each stool screening modality (mt-sDNA, FIT, and HSgFOBT) are provided in Figs B1, B2, and B3 in S2 Appendix. Additionally, CRC-AIM's stool screening outcomes were used in conjunction with CISNET's optimal strategy-selection algorithm, under identical conditions. Detailed methods are provided in S2 Appendix. Full details of the natural history and screening components, as well as additional validation results evaluating qualitative and quantitative outputs to those from CRC-SPIN, SimCRC, and MISCAN [4,9], are available elsewhere [17]. Additional queries related to collaboration or model availability can be made at collabora-te@crcaim.com.
For the primary analyses, all CRC screening test performance assumptions (sensitivity, specificity, and complications) were identical to the CISNET modeling analyses used to inform the 2016 USPSTF guideline recommendations [4,9].

CRC screening outcomes
Predicted CRC screening outcomes were simulated for 4 million average-risk individuals born in 1975. Efficacy was measured by number of LYG compared with no CRC screening. Number of colonoscopies was used as a proxy for burden and harms. The number of total stool tests, complications from colonoscopies, CRC cases, CRC deaths, life-years with CRC, incidence reduction, and mortality reduction were analyzed as additional outcomes. All outcomes were reported per 1,000 individuals free of diagnosed CRC at age 40.
For each stool-based screening modality, up to 18 strategies were evaluated, consisting of a unique combination of screen interval (1, 2 y FIT/HSgFOBT, 2, 3 y mt-sDNA), age to begin screening (45, 50, 55 y), and age to end screening (75, 80, 85 y) ( Table 1). Detailed screening outcomes are presented from selected stool-based screening strategies, namely annual FIT, annual HSgFOBT, and triennial mt-sDNA for individuals screened between ages 50-75 and 45-75, since these are the strong and qualified USPSTF and American Cancer Society (ACS) intervals and age recommendations [3,4]. Of note, there are ongoing discussions among organizations that develop CRC screening guidelines, including USPSTF and ACS, about the appropriateness of lowering the screening age to 45 for average-risk adults. Screening outcomes from other stop-start age combinations can be accessed at https://github.com/ CRCAIM/CRC-AIM-Public.

Adherence analyses
Modeling of real-world CRC screening adherence is complicated and, at present, there is no universally accepted methodology. To sufficiently evaluate the impact of adherence on CRC screening outcomes, multiple adherence analyses were performed, described below. Since the purpose of these analyses was to determine the impact of adherence only on stool-based screening, perfect adherence to diagnostic and surveillance colonoscopies was assumed for all analyses in accordance with CISNET models [4,9]. Analysis 1. The purpose of analysis 1 was to evaluate the comparative effectiveness of stool-based testing screening strategies assuming a spectrum of adherence rates for each modality. Adherence was set by assuming a fixed annual likelihood to comply with each stool-based screening strategy ranging from 0% to 100%, in 10% increments. It was assumed that individuals were offered a stool-based test every year unless they were not due for screening. Analysis 2. The purpose of Analysis 2 was to estimate the comparative impact of differential adherence to each stool test using real-world first-round screening participation rates and applying them as a fixed annual likelihood to comply. Due to the heterogeneity of reported real-world adherence, several different imperfect adherence scenarios were evaluated. The base-case imperfect adherence rates were 40% FIT [21] vs 34% HSgFOBT [21] vs 70% mt-sDNA [22]. Adherence of 40% for FIT was loosely estimated based on pooling participation rates reported in a meta-analysis [21] resulting in 42% adherence (range of individual studies was 26% to 62%). Adherence for HSgFOBT was found to be approximately 16% lower relative to FIT in a meta-analysis [21], and therefore was analyzed as 34% (40% FIT / 1.16 = 34%). Adherence of 70% for mt-sDNA was based on a retrospective cohort analysis [22].
Efficient frontier of stool-based screening strategies. Using a similar methodology to the CIS-NET Colorectal Cancer Working Group (CWG) [4], efficient frontiers were used to visualize tradeoffs between benefits and burdens, and tables were used to summarize results. Two different sets of benefit/burden pairs were evaluated. The first set of outcomes for efficient frontier generation used LYG and the number of required colonoscopies. The second set of outcomes

PLOS ONE
Impact of adherence on CRC stool-based screening outcomes using the CRC-AIM microsimulation model used LYG and patient hours related to the screening process. Based on previously published CISNET assumptions, patient time related to the screening process was assumed to be 16 hours for colonoscopy, 112 hours for complications due to colonoscopy, and 1 hour for FIT [23]. Patient time to complete mt-sDNA was assumed to be the same as FIT and for HSgFOBT was assumed to be 2 hours: ½ hour was assumed for each of the 3 evacuation/collection/storage processes required for HSgFOBT plus ½ hour for instructions/unboxing/setup/mailing. Each benefit/burden pair were plotted to generate efficient frontiers. Strongly dominated strategies (i.e, strategies that required more colonoscopies or patient hours for fewer LYG) were discarded. The incremental number of LYG per 1000 (ΔLYG) and incremental number of colonoscopies per 1000 (ΔCOL) or patient hours related to the screening process (Δpatient hours) were computed. The efficiency ratio (ΔCOL/ΔLYG or Δpatient hours/ΔLYG) for each remaining strategy was calculated. Strategies with fewer LYG but a higher efficiency ratio than another strategy were discarded as weakly dominated. The efficient frontier was the line that connected the efficient strategies; strategies that had LYG within 98% of the efficient frontier were considered near-efficient. Typically, a recommendable strategy must be efficient or near efficient, offer LYG within 90% of a benchmark COL strategy, must have an efficiency ratio (e.g. incremental COL per incremental LYG) at or below the range of CISNET model-derived acceptable thresholds, based on the selected benchmark COL strategy (e.g. incremental ratios between 39-65 for a 10-year benchmark COL when the screening window is , and, if all criteria are met, offer the most LYG [4]. However, in the context of this analysis, %LYG of the benchmark COL is not applicable since it is assumed that real-world adherence is imperfect (including that of the benchmark COL, which is not evaluated), and for simplicity, we assume efficiency ratio thresholds of 39-65 irrespective of a benchmark COL strategy. Efficient frontier plots for all strategies were generated for perfect adherence rates (to replicate CISNET results [4,9]) and imperfect adherence rates. Calculated frontier outcomes and efficiency ratios were then tabulated for all strategies, from which a subset could be taken to assess outcomes for a particular screen window [4,9]. Mimicking an alternate approach to this algorithm [24], frontier outcomes were separately calculated for all modalities for a given screen window (e.g., , as opposed to including all screen windows on the same frontier prior to calculating frontier outcomes [4,9], since it is presumed a priori that only one start-stop age pair would be implemented in practice. Analysis 3. The purpose of analysis 3 was to evaluate the comparative effectiveness of stool-based testing assuming varying numbers of completed stool-based tests. This analysis represents individuals who are randomly adherent during a screening window from ages 50-75. Simulated individuals were randomly assigned to up to 1, 5, or 9 (9 = 100% adherence) mt-sDNA tests and up to 1, 5, or 9 to 26 (26 = 100% adherence) FIT tests during the lifetime screening window. Simulated individuals may have completed fewer than their assigned tests if they died or had a positive stool test. Only recommended screening strategies of triennial mt-sDNA and annual FIT were included in this analysis; therefore, no efficient frontiers were generated [3]. An additional analysis assuming a screening window from ages 45-75 was also performed, with a corresponding set of randomly completed tests (e.g., 31 = 100% adherence for FIT and 11 = 100% adherence for mt-sDNA).

Sensitivity analysis
Test performance sensitivity analyses. The same screening characteristics for FIT, mt-sDNA, and HSgFOBT were used in the CRC-AIM primary analyses and the CISNET models [4,9]. These screening characteristics for FIT and mt-sDNA were derived from data generated in a cross-sectional study of 9989 participants (aged 50-84 years) at average risk of CRC who underwent colonoscopy, mt-sDNA, and FIT screening (DeeP-C study; clinicaltrials.gov identifier, NCT01397747) [25]. Adenoma findings in the published report by Imperiale et al [25] were not distinguished by size or location, but rather were categorized as advanced (�10 mm) or non-advanced adenomas. Thus, the CISNET models and the CRC-AIM model primary analyses used the published sensitivity of advanced adenomas as a proxy for the sensitivity of adenomas �10 mm and non-advanced adenomas as a proxy for the sensitivity of adenomas 1-5 mm and 6-9 mm combined [9]. Additional outcomes for adherence analyses #1 and #3 were generated using screening characteristics for FIT and mt-sDNA derived from more granular estimates of diagnostic sensitivity, stratified by adenoma size and location (rectal, distal, proximal), and age-based specificity data collected in DeeP-C [25] to which the study sponsor had access. The screening characteristics for the sensitivity analysis are shown in Table 2. All other aspects of the CRC-AIM modeling in the sensitivity analyses were the same as the primary analyses. Finally, for informational purposes, sensitivity to detect adenomas based on size alone for FIT and mt-sDNA, derived from DeeP-C (clinicaltrials.gov identifier, NCT01397747) [25], are in S1 Table. Efficiency frontier sensitivity analyses. Sensitivity analyses of efficient frontier plots (analysis #2) were conducted using 50% FIT vs 43% HSgFOBT vs 70% mt-sDNA adherence and 60% FIT vs 52% HSgFOBT vs 70% mt-sDNA adherence based on existing data referent to test-specific adherence rates [14,[26][27][28][29][30][31].
An additional sensitivity analysis was performed to evaluate an alternative approach to modeling imperfect adherence. CISNET has modeled imperfect adherence by splitting the population between those that are 100% adherent to screening versus those who are 0% adherent to screening (CISNET fixed split adherence approach) [32][33][34]. This approach was replicated to evaluate its impact on base-case adherence assumptions for analysis #2.

Analysis 1: Spectrum of adherence
Analysis #1 assumed a spectrum of adherence in 10% increments for stool-based screening strategies. Each screening strategy reduced CRC-related incidence and mortality compared with no

PLOS ONE
Impact of adherence on CRC stool-based screening outcomes using the CRC-AIM microsimulation model screening. When the screening window was between ages 50−75, for triennial mt-sDNA, the LYG from 10% to 100% adherence ranged from 133.1 to 300.0, for annual FIT ranged from 96.3 to 318.1, and for annual HSgFOBT ranged from 99.8 to 320.6 (S2 Table). LYG for FIT was more sensitive to per-unit change in adherence rates ([318.1−96.3]/[100%−10%] = 2.5 LYG/unit change) than mt-sDNA (1.9 LYG/unit change). LYG for HSgFOBT was also more sensitive to per-unit change (2.5 LYG/unit change) versus mt-sDNA. A matrix of relative LYG comparisons between mt-sDNA and FIT for all adherence combinations (Fig 1) illustrates regions of differential adherence where the tests have similar (<10%) or dissimilar (�10%) differences in LYG. At perfect adherence, the LYG and reductions in CRC-related incidence and mortality were highest for annual HSgFOBT, followed by annual FIT and triennial mt-sDNA (Fig 2). The total number of stool tests was lower with mt-sDNA vs FIT and HSgFOBT (Fig 2). Total required colonoscopies was similar between mt-sDNA and FIT and was slightly higher for HSgFOBT (Fig 2). Other screening outcomes also vary across the spectrum of adherence assumptions and assumed screening windows (S2 Table and Fig 2). Mt-sDNA had the highest number of colonoscopies, whereas FIT had the highest number of stool tests (Fig 2).  Efficient frontier of stool-based screening strategies. The efficient frontiers by LYG relative to number of colonoscopies of stool-based screening strategies assuming perfect adherence or base-case imperfect adherence rates are shown in Fig 3. When shifting from perfect to base-case imperfect adherence rates, strategies that are on or off the efficient frontier can change, and efficiency ratios can substantially change. (S3 and S4 Tables). Adherence-related changes to efficient frontiers impact model-recommendable strategies. When the screen window is fixed at 50-75 followed by recalculating efficiency comparisons, the perfect adherence model-recommended strategy is annual FIT (incremental ratio 12.5 COL/LYG, as opposed to near-efficient biennial mt-sDNA and efficient annual HSgFOBT, which offer more LYG but exceed the efficiency ratio [slope] threshold of 39-65 at incremental ratios of 196.6 COL/LYG and 142.8 COL/LYG, respectively) and the base-case imperfect adherence recommended strategy is biennial mt-sDNA (incremental ratio of 12.8 COL/LYG, with no other strategies offering more LYG) (S5 Table). Similar results were obtained for a screening window from 45-75 (S5 Table).

Fig 3.
Life-years gained for individuals 40 years of age with stool-based tests by A) number of colonoscopies assuming perfect (100%) adherence or B) base-case imperfect adherence rates (40% FIT vs 34% HSgFOBT vs 70% mt-sDNA) and by C) patient hours related to the screening process assuming perfect (100%) adherence or D) base-case imperfect adherence (40% FIT vs 34% HSgFOBT vs 70% mt-sDNA). Results shown are per 1000 individuals free of diagnosed colorectal cancer at age 40 and screened starting at age 45 or 50 and ending at age 75 or 80 receiving biennial or triennial mt-sDNA, annual or biennial FIT, and annual or biennial HSgFOBT. NE, near-efficient. The efficient frontiers by LYG relative to patient hours related to the screening process assuming perfect adherence or base-case imperfect adherence rates are shown in Fig 3. When shifting from perfect to base-case imperfect adherence rates, all but one of the strategies that are on or off the efficient frontier do not change (S6 and S7 Tables). Although there are no pre-established efficiency thresholds for this analysis, an absolute upper bound of an incremental 8,766 screening process hours per incremental LYG could be used, representing an equivalent tradeoff of burden (365.25 days/year x 24 hours/day = 8,766 hours) and benefit, with other thresholds consisting of a fraction of this value. Assuming a screening window from 50-75, followed by recalculating efficiency comparisons, the annual HSgFOBT strategy on the frontier would exceed the threshold with 16,807.5 incremental patient hours per incremental LYG compared with biennial mt-sDNA, which has an efficiency ratio of 350.0 hours per LYG and could be considered an optimal strategy (S8 Table). Similarly, at base-case imperfect adherence rates, mt-sDNA generally had preferable efficient ratios while offering the most LYG for a given screen window (S8 Table). There were no HSgFOBT strategies that were efficient or near-efficient. Similar results were obtained for a screening window from 45-75 (S8 Table).

Analysis 3: Varying numbers of completed tests
Adherence analysis #3 assumed varying numbers of completed stool-based tests. Life-years gained per 1000 individuals screened between ages 50-75 was greater with up to 1, up to 5, or up to 9 mt-sDNA tests (68.2, 217.8, and 294.8, respectively) compared with equivalent numbers of FIT tests (41.4, 143.1, and 201.2; Table 3). An individual would have to take up to 21 FIT tests to reach approximately the same LYG as an individual who took up to 9 mt-sDNA tests (Fig 4). For up to 1, 5, or 9 tests, the incremental ratios of COL/LYG were favorable for mt-sDNA compared with FIT ( Table 3) and are below the CISNET-accepted ratios of 39-65.
The reductions in CRC-related incidence and CRC-related mortality were also greater with up to 1, up to 5, or up to 9 mt-sDNA tests compared with equivalent numbers of FIT tests, whereas the number of total stool tests was similar between mt-sDNA and FIT when comparing equivalent numbers of tests ( Table 3).
The pattern of results for analysis #3 in individuals screened between ages 45-75 was similar to those of individuals screened between ages 50-75 (S9 Table). An individual would have

Sensitivity analyses
Test performance sensitivity analyses. When granular adenoma size and location sensitivity values and age-based specificity values are used from DeeP-C (clinicaltrials.gov identifier, NCT01397747) [25], results from analysis #1 and #3 only modestly change. In analysis #1, when assuming perfect adherence in individuals screened between ages 50-75, LYG for triennial mt-sDNA increased from the primary analysis by 1.3% (303.8 vs 300.0) and total colonoscopies decreased by 4. 3% (1872 vs 1957). LYG for annual FIT decreased by 1.0% (315.0 vs 318.1) and total colonoscopies decreased by 5.4% (1927 vs 2037) (S2 and S10 Tables). At basecase imperfect adherence rates in individuals screened between ages 50-75, mt-sDNA resulted in a 19.6% increase in LYG (288.1) versus FIT (241.0; S4 Fig). In analysis #3, for up to 1, 5, or 9 tests, the incremental ratios of COL/LYG remained favorable for mt-sDNA compared with FIT and decreased by 4.6%, 6.5%, and 10.2%, respectively, from the primary analysis (Table 3 and S11 Table). The number of FIT tests an individual would have to take to reach approximately the same LYG as an individual who took up to 9 mt-sDNA tests increased from 21 to 22 FIT tests (S5 Fig).

PLOS ONE
Impact of adherence on CRC stool-based screening outcomes using the CRC-AIM microsimulation model Efficiency frontier sensitivity analyses. For scenarios assuming 50% FIT vs 43% HSgFOBT vs 70% mt-sDNA adherence or 60% FIT vs 52% HSgFOBT vs 70% mt-sDNA adherence, the efficient frontiers showed only modest changes from the base-case imperfect adherence rate analysis. Both FIT and mt-sDNA continued to have efficient strategies while HSgFOBT did not (S6 Fig). When the screen window is fixed at 50-75 followed by recalculating efficiency comparisons, the recommendable strategy is biennial mt-sDNA with an efficiency ratio of 14.7 assuming 50% FIT vs 43% HSgFOBT vs 70% mt-sDNA adherence and 18.1 assuming 60% FIT vs 52% HSgFOBT vs 70% mt-sDNA adherence. Across the two sensitivity scenarios, only one strategy changed from (near)-efficient to not-efficient (triennial mt-sDNA, 50-75); this change occurred in the 60% FIT vs 52% HSgFOBT vs 70% mt-sDNA adherence scenario (S5 Table).
For the efficient frontiers by LYG relative to patient hours related to the screening process, the results were similar to those from the base-case imperfect adherence rate analysis and the strategies classified as either efficient or near-efficient were unchanged (S6 Fig). When the screen window is fixed followed by recalculating efficiency comparisons, conclusions regarding optimal strategy did not change (S8 Table).
Sensitivity analyses for analysis #2 using the CISNET fixed split adherence approach are shown in S7 Fig. The number of efficient strategies increases for mt-sDNA and decreases for FIT. HSgFOBT does not produce efficient strategies in either modeling approach. The CISNET fixed split adherence approach is more favorable to mt-sDNA strategies, as more of these strategies are on (or approach) the efficient frontier relative to the primary analysis (Fig 3 and S7 Fig).

Discussion
An endorsed concept in cancer screening is the "best test is the one that gets done" [35,36]. As others have noted, tests also need to be done well [37]. Although non-invasive screening tests provide options to individuals who do not want to be screened by more invasive methods, the efficacy of these tests is largely dependent upon consistently completing them when required. Relatively poor test completion rates are one reason why HSgFOBT has been suggested to be replaced by FIT in particular US settings where HSgFOBT is still used [38]. As demonstrated in this report, harm-benefit tradeoffs and optimal test selection directly depend upon particular adherence patterns. Using the CRC-AIM microsimulation model [16,17], a robust and comprehensive analysis was conducted to determine the impact of imperfect adherence on stool-based CRC screening outcomes for average-risk individuals. The results demonstrate that imperfect adherence has a substantial impact on the relative benefits and harms of guideline-endorsed, stool-based CRC screening strategies. The trade-offs between burden and benefit change when adherence rates are assumed to be less than 100%, shifting 4 of the 24 stoolbased screening strategies from dominated to efficient or near-efficient (annual FIT 50-80; biennial and triennial mt-sDNA 45-75; triennial mt-sDNA 45-80). The model-recommended strategy also shifts from an annual FIT to a biennial mt-sDNA. At base-case imperfect adherence, mt-sDNA strategies are efficient/near-efficient, have efficiency ratios below accepted thresholds, and offer more LYG than FIT or HSgFOBT. When assuming patients are randomly adherent to an equal number of stool-based screening tests, more than twice the number of FIT tests are needed to match the equivalent amount of LYG as mt-sDNA. The LYG from FIT and HSgFOBT are more sensitive to changes in adherence assumptions than mt-sDNA because their lower neoplasia sensitivity requires more tests be completed for equivalent benefit.
Imperfect adherence scenarios can be compared with perfect adherence scenarios to assess the changes to CRC incidence and mortality reduction as a function of adherence assumptions. For example, when screening from ages 50-75, perfect adherence for annual FIT results in a 68.4% CRC incidence reduction and 76.3% CRC mortality reduction compared to no screening; these numbers fall to 54.1% and 63.1%, respectively, when assuming an annual likelihood to comply of 50%. Alternatively, if one assumes 50% of the population is 100% adherent, and 50% is 0% adherent, the corresponding incidence and mortality reduction drop to 34.2% and 38.1%, respectively. Disparities in these outcomes highlight a few important points: 1) from a modeling standpoint, there is a need to utilize realistic adherence assumptions and mechanisms (which are currently unknown, from a longitudinal standpoint), since using the same adherence value but different mechanisms can result in very different aggregated patient outcomes; and 2) from a patient-outcomes standpoint, consistent screening is critical (regardless of modeling mechanism used) for CRC incidence and mortality reduction.
Other modeling studies have observed substantive influence from imperfect CRC screening adherence on the simulated outcomes. Similar to the current analyses, another microsimulation model found that when reported adherence rather than perfect adherence was assumed, the LYG and reductions in CRC incidence and mortality were higher with triennial mt-sDNA compared with annual FIT or HSgFOBT [10]. In Zauber et al 2008 [39], sensitivity analysis showed that when adherence was lowered from 100% to 50%, the LYG became higher for FIT and HSgFOBT compared with colonoscopy (mt-sDNA was not included in the analysis). Sensitivity analyses in Knudsen et al 2010 [33] demonstrated that the optimal CRC screening modality can change when adherence is varied. Adherence to less-invasive tests appears to be an important, yet under-appreciated, factor when assessing the relative cost-effectiveness of CRC screening [40].
In routine clinical practice, adherence to test-specific CRC screening strategies is imperfect and varies across patient subgroups [12][13][14][15]. Even for a single individual, the likelihood to adhere may change over time because of a variety of factors (e.g., education about screening, life events, etc.). Furthermore, patients may switch between different screening modalities and adherence may decline over time. Although imperfect adherence in real-world settings is recognized [4], perfect adherence is usually used in CRC screening microsimulations because there is no universally accepted methodology to model imperfect adherence and all of the methods used have challenges. One method often utilized by CISNET is to assume a proportion of a population is perfectly adherent to a test, and the remaining proportion are never adherent (fixed split adherence approach) [32][33][34]. This effectively results in a weighted average of outcomes for 100% adherence and natural history (i.e., never-adherent) [33]. Commonly used values for this imperfect adherence assumption generally range between 50%-60%, loosely corresponding to the proportion of individuals up-to-date with CRC screening [32][33][34]. A challenge of this approach is that the proportion of individuals up-to-date for a CRC test does not necessarily correspond to constant, perfect longitudinal adherence, it only corresponds to a proportion of individuals who have previously undergone a screen at some point within a screen modality's testing interval, ignoring whether the screen was taken on time (e.g. at age 60, the proportion who took a colonoscopy at age 50, rather than the proportion that took the test over the last 10 years). Additionally, evidence suggests that it is incorrect to assume that individuals who previously took a screening test will consistently take it in future years [41,42]. The fixed split adherence approach also assumes that individuals who are not adherent will never take a screening test, which is also unrealistic [27].
The current analysis models imperfect adherence for stool tests using cross-sectional, firstround, participation rates. This approach assumes that if an individual does not adhere to a test, that the individual has the same fixed probability to adhere the following year and that for individuals that do adhere and the test is negative, the individual has the same fixed annual likelihood to adhere in future screen opportunities. Over infinite time, there are no individuals who never or perfectly adhere. Many more individuals get screened using this approach compared with the fixed split adherence approach.
The most realistic simulation of adherence may be somewhere in between the two methods. One approach has been to model adherence as a mixed distribution of individuals who never adhere, perfectly adhere, and imperfectly adhere, with the imperfectly adherent group further sub-stratified into different levels of adherence likelihood [39,43]. Given a lack of longitudinal adherence data to act as calibration and validation targets, the inter and intra-distributions of these groups are difficult to quantify, especially if differential adherence is assumed across strategies.
Despite the challenges in modeling real-world adherence, the strength of this analysis is that multiple modeling approaches were conducted and many sensitivities were explored to address the challenges. For analysis #1, a spectrum of adherence in 10% intervals was applied. Although this approach does not reflect a real-world population, it is a simplified and thorough way to show the impact of imperfect adherence. For analysis #2, the goal was to apply real-world first-round differential adherence rates. Therefore, we conducted a critical assessment of meta-analyses and retrospective cross-sectional data, which, although limited by some caveats, indicated that actual adherence for annual FIT is likely~40% based on a weighted average (accounting for heterogeneity) of 42% adherence from studies included in a metaanalysis [21]. This rate is similar to that determined from a large database of Veteran's Affairs data in the US (42.6%) [38]. Based on meta-analysis data, annual HSgFOBT is~16% [21] to 21% [31] relatively lower than FIT, thus, HSgFOBT adherence rates of 16% lower relative to FIT were assumed. The 70% adherence rate with triennial mt-sDNA was determined based on a large (n = 368,494) retrospective analysis of Medicare beneficiaries from diverse geographic regions [22].
The higher adherence rate with mt-sDNA compared with FIT and HSgFOBT is likely related to multiple factors. Patient navigation programs are always available for patients using mt-sDNA [22] and such programs have been shown to increase adherence [44][45][46][47]. Differences in the practical aspects of completing the stool-based tests may also play a role. For example, HSgFOBT requires diet modification and samples from multiple bowel movements, unlike FIT and mt-sDNA [48]. Although the retrospective analysis used to determine the mt-sDNA adherence was conducted in the Medicare population and older patients are known to be more adherent to CRC screening [12,49], within the analyzed population there were no age-related trends for mt-sDNA adherence nor were any trends observed based on the type of Medicare coverage [22].
A sensitivity analysis of analysis #2 was conducted using adherence rates of 50% and 60% for FIT (and corresponding 16% relatively lower rates for HSgFOBT). Adherence values of 50% and 60% were based on a weighted average (accounting for heterogeneity) of 56% adherence for first-round participation rates reviewed by the US Multi-Society Task Force on Colorectal Cancer [26]. Adherence of approximately 50% has been observed by Kaiser Permamente in a retrospective cohort study of 323,349 members [14]. The Kaiser Permanente analysis was within the context of an organized screening program that included patient navigation [14]. Such programs are not universally available to FIT patients in the US. The highest reported rates of FIT adherence were reported in Dutch studies (~60%) [27][28][29][30]. Some of the Dutch studies were in the context of a CRC screening program and may not be reflective of a US population. Socioeconomic, cultural, and geographic factors (i.e., patients were selected only from municipalities), as well as national policies may have contributed to high adherence rates [50]. Results from the sensitivity analyses using higher FIT and HSgFOBT adherence rates are in agreement with the base-case imperfect adherence analysis in that biennial mt-sDNA is the model recommendable strategy within a 50-75 year screening window. "Screening fatigue", defined as missing a test after receiving several negative CRC tests, is a potential problem that reduces the effectiveness of screening [51]. A CRC model estimated that screening fatigue could considerably reduce screening effectiveness [52]. In analysis #3, individuals are only randomly adherent to a fixed number of completed stool tests, similar to a patient with screening fatigue who is only willing to take a limited number of stool tests. In this context, a patient would need to be willing to take approximately 21 FIT tests to produce the same LYG as 9 mt-sDNA tests. A retrospective analysis of individuals with commercial insurance or Medicare in the US found that only 268/17,174 (1.56%) of the individuals who received any annual FIT/HSgFOBT test (with no sigmoidoscopy) received at least 1 annual FIT/FOBT test per year over 10 years [53]. This excludes individuals who had a follow-up colonoscopy from a false-positive. Therefore, it is unlikely that many patients would complete 21 negative FIT tests in a 25-year period.
Efficient frontiers are a way to evaluate the trade-offs between the benefits and harms from screening tests. Although the CISNET modeling group commonly uses total colonoscopies for efficient frontier analysis, they have also recognized the need to assess alternate metrics for patient burden when recommending modeling strategies [4]. The current analysis evaluated efficient frontiers not only from a healthcare burden perspective (e.g., LYG relative to number of colonoscopies), but also from a patient burden perspective (e.g., LYG relative to number of patient hours related to the screening process). At base-case imperfect adherence rates, mt-sDNA generally had preferable efficient ratios while offering the most LYG when compared with FIT or HSgFOBT from both the healthcare and patient burden perspectives. Looking at other measures of burden may be complementary to using number of colonoscopies to provide an additional perspective for efficiency frontiers.
Other important population-level decisions are also based on tradeoffs in harms and benefits, including when screening should start and end. Assuming perfect adherence, two of three CISNET models found that the age to start screening could be lowered to age 45 based on harm-benefit tradeoffs for colonoscopy and FIT [9]. These tradeoffs would shift under imperfect adherence assumptions. Although the goal of any screening program is maximizing adherence, perfect adherence is unlikely in real-world clinical settings. As such, policy-related decisions should be explored under the lens of real-world adherence assumptions and clinical care patterns.
The current analysis is limited to assessing the implications of imperfect adherence related to stool tests, a single class of CRC screening tests, for an average-risk population. Additional analysis is needed to understand if screen recommendations for other screening classes (colonoscopy, sigmoidoscopy, sigmoidoscopy combined with stool tests, or computed tomographic colonography) would change based on imperfect adherence. Additional analysis could also be extended to subpopulations stratified by racial/ethnic groups, urban/rural location, type of insurance coverage, or sex. Such analyses would need to be tailored to a group's specific CRC risk and adherence patterns. Also, due to the chosen perspective of the analysis, the current analysis does not take into account costs and health-state utilities associated with screening, as would be considered in cost-effectiveness analysis. The scope of the analysis was to show the influence of imperfect adherence on comparative effectiveness and tradeoffs in harms and benefits, outcomes important to US-policy-makers. Future applications of the model will consider cost-effectiveness evaluations from both payer and societal perspectives.
The CRC-AIM model is limited in that, like the CISNET models, it does not account for serrated polyps, the natural history of which differs from the typical adenoma-carcinoma sequence, and may account for up to 30% of CRC [54,55]. Sessile serrated polyps are detected with good sensitivity by mt-sDNA but not FIT [56,57]. Although not explicitly modeled, hyperplastic polyp detection is reflected in the false-positive rate (1-specificity) for colonoscopy. The model also does not account for performance of colonoscopy (adenoma sensitivity) in the real-world that varies by endoscopist [58] and by screening versus follow-up [59]. Finally, to be comparable to CISNET models, the primary CRC-AIM analysis was limited to using CISNET test characteristic parameters. In sensitivity analyses of analysis #1 and #3, inputs for sensitivity by adenoma size/location and specificity by age for mt-sDNA and FIT were used after obtaining clinical data from the sponsor of the study published by Imperiale et al (DeeP-C; clinicaltrials.gov identifier, NCT01397747) [25]. Although the changes to the predicted outcomes were relatively small, more granular test performance is also needed for other screening modalities for equivalent comparisons because small changes may impact efficiency frontiers.
The validation results demonstrated that CRC-AIM generates similar stool screening outcomes in terms of colonoscopies, LYG, CRC incidence reduction, and CRC mortality reduction compared with the CISNET models, but there could be underlying deep model processes that would cause the model to generate discordant outcomes compared to other models under different scenarios or for different metrics. However, CRC-AIM was validated toward its intended use for the current analysis, namely, demonstrating it can adequately replicate harmbenefit tradeoffs for optimal strategy selection. Additionally, other model outcomes have been made publicly available for transparency: cross-model comparisons of natural history outcomes are in a preprint server [17]; the most relevant screening outcomes for CRC-AIM are in the supplement; and screening outcomes for all start/stop ages and intervals are provided in an online repository available at https://github.com/CRCAIM/CRC-AIM-Public.

Conclusion
Based on the CRC-AIM model, all stool-based screening strategies decrease CRC-related incidence and mortality compared with no screening, regardless of adherence assumptions. Conclusions about the comparative tradeoffs between burden and benefit that are used to inform CRC screening guidelines change when adherence rates are assumed to be less than 100%. At imperfect adherence rates, mt-sDNA provides more LYG than FIT or HSgFOBT at an acceptable tradeoff in screening burden. LYG from FIT and HSgFOBT are more sensitive to changes in adherence assumptions than mt-sDNA because they require more tests be completed for equivalent benefit.
Real-world adherence to stool-based testing is imperfect and should be simulated using realistic inputs in comparative effectiveness modeling to more accurately assess the benefits of screening. Physicians and guidelines should consider the impact of imperfect adherence when making recommendations to patients. HSgFOBT vs 70% mt-sDNA adherence or B) assuming 60% FIT vs 52% HSgFOBT vs 70% mt-sDNA adherence and by C) patient hours related to the screening process and assuming 50% FIT vs 43% HSgFOBT vs 70% mt-sDNA adherence or D) assuming 60% FIT vs 52% HSgFOBT vs 70% mt-sDNA adherence. Results shown are per 1000 individuals free of diagnosed colorectal cancer at age 40 and screened starting at age 45 or 50 and ending at age 75 or 80 receiving biennial or triennial mt-sDNA, annual or biennial FIT, and annual or biennial HSgFOBT. NE, near-efficient. (TIF)

S7 Fig.
Life-years gained for individuals 40 years of age with stool-based tests by A) number of colonoscopies or B) patient hours related to the screening process assuming base-case imperfect adherence rates of 40% FIT vs 34% HSgFOBT vs 70% mt-sDNA using the CISNET fixed split adherence approach. Results shown are per 1000 individuals free of diagnosed colorectal cancer at age 40 and screened starting at age 45 or 50 and ending at age 75 or 80 receiving biennial or triennial mt-sDNA, annual or biennial FIT, and annual or biennial HSgFOBT. NE, near efficient. (TIF) S1 Table. Screening characteristics for Deep-C (clinicaltrials.gov identifier, NCT01397747) sensitivity analyses using granular adenoma size. (DOCX) S2 Table. Screening outcomes per 1000 individuals by adherence rate for triennial mt-sDNA, annual FIT, and annual HSgFOBT in individuals free of diagnosed colorectal cancer at age 40 and screened between ages 50-75 years or 45-75 years. (DOCX) S3 Table. Outcomes and efficiency ratios of LYG relative to number of colonoscopies assuming perfect (100%) adherence. Results are ordered by total colonoscopies. Results shown are per 1000 individuals free of diagnosed colorectal cancer at age 40 and screened starting at age 45 or 50 and ending at age 75 or 80 receiving biennial or triennial mt-sDNA, annual or biennial FIT, and annual or biennial HSgFOBT. Bold row is the model-recommended strategy. (DOCX) S4 Table. Outcomes and efficiency ratios of LYG relative to number of colonoscopies at base-case imperfect adherence rates of 40% FIT vs 34% HSgFOBT vs 70% mt-sDNA. Results are ordered by total colonoscopies. Results shown are per 1000 individuals free of diagnosed colorectal cancer at age 40 and screened starting at age 45 or 50 and ending at age 75 or 80 receiving biennial or triennial mt-sDNA, annual or biennial FIT, and annual or biennial HSgFOBT. Gray highlight indicates shift from dominated to efficient or near-efficient from 100% adherence assumption. Italics indicates shift from efficient or near-efficient to dominated from 100% adherence assumption. Bold row is the model-recommended strategy. (DOCX) S5 Table. Incremental efficiency ratios for number of colonoscopies at a fixed screening window of 50-75 or 45-75 assuming perfect (100%) adherence, base-case imperfect adherence rates of 40% FIT vs 34% HSgFOBT vs 70% mt-sDNA, adherence of 50% FIT vs 43% HSgFOBT vs 70% mt-sDNA, or adherence of 60% FIT vs 52% HSgFOBT vs 70% mt-sDNA. Results are ordered by total colonoscopies. Results shown are per 1000 individuals free of diagnosed colorectal cancer receiving biennial or triennial mt-sDNA, annual or biennial FIT, and annual or biennial HSgFOBT. (DOCX) S6 Table. Outcomes and efficiency ratios based on patient hours related to the screening process assuming perfect (100%) adherence. Results are ordered by patient hours. Results shown are per 1000 individuals free of diagnosed colorectal cancer at age 40 and screened starting at age 45 or 50 and ending at age 75 or 80 receiving biennial or triennial mt-sDNA, annual or biennial FIT, and annual or biennial HSgFOBT. Bold row is the model-recommended strategy. (DOCX) S7 Table. Outcomes and efficiency ratios based on patient hours related to the screening process at base-case imperfect adherence rates of 40% FIT vs 34% HSgFOBT vs 70% mt-sDNA. Results are ordered by patient hours. Results shown are per 1000 individuals free of diagnosed colorectal cancer at age 40 and screened starting at age 45 or 50 and ending at age 75 or 80 receiving biennial or triennial mt-sDNA, annual or biennial FIT, and annual or biennial HSgFOBT. Italics indicates shift from efficient or near-efficient to dominated from 100% adherence assumption. Bold row is the model-recommended strategy. (DOCX) S8 Table. Incremental efficiency ratios for patient hours related to the screening process at a fixed screening window of 50-75 or 45-75 assuming perfect (100%) adherence, base-case imperfect adherence rates of 40% FIT vs 34% HSgFOBT vs 70% mt-sDNA, 50% FIT vs 43% HSgFOBT vs 70% mt-sDNA adherence, or 60% FIT vs 52% HSgFOBT vs 70% mt-sDNA adherence. Results are ordered by patient hours. Results shown are per 1000 individuals free of diagnosed colorectal cancer receiving biennial or triennial mt-sDNA, annual or biennial FIT, and annual or biennial HSgFOBT. (DOCX) S9 Table. Predicted outcomes per 1000 individuals screened from ages 45-75 compared with no screening. Individuals were randomly assigned numbers of mt-sDNA (max = 11 triennial tests during the screening window) or FIT (max = 31 annual tests during the screening window). (DOCX) S10 Table. Deep-C (clinicaltrials.gov identifier, NCT01397747) sensitivity analysis for screening outcomes per 1000 individuals by adherence rate for triennial mt-sDNA and annual FIT in individuals free of diagnosed colorectal cancer at age 40 and screened between ages 50-75 years or 45-75 years. (DOCX) S11 Table. Deep-C (clinicaltrials.gov identifier, NCT01397747) sensitivity analysis of predicted outcomes per 1000 individuals screened from ages 50-75 compared with no screening. Individuals were randomly assigned numbers of mt-sDNA (max = 9 triennial tests during the screening window) or FIT (max = 26 annual tests during the screening window).