Choice of CTO scores to predict procedural success in clinical practice. A comparison of 4 different CTO PCI scores in a comprehensive national registry including expert and learning CTO operators

Background We aimed to compare the performance of the recent CASTLE score to J-CTO, CL and PROGRESS CTO scores in a comprehensive database of percutaneous coronary intervention of chronic total occlusion procedures. Methods Scores were calculated using raw data from 1,342 chronic total occlusion procedures included in REBECO Registry that includes learning and expert operators. Calibration, discrimination and reclassification were evaluated and compared. Results Mean score values were: CASTLE 1.60±1.10, J-CTO 2.15±1.24, PROGRESS 1.68±0.94 and CL 2.52±1.52 points. The overall percutaneous coronary intervention success rate was 77.8%. Calibration was good for CASTLE and CL, but not for J-CTO or PROGRESS scores. Discrimination: the area under the curve (AUC) of CASTLE (0.633) was significantly higher than PROGRESS (0.557) and similar to J-CTO (0.628) and CL (0.652). Reclassification: CASTLE, as assessed by integrated discrimination improvement, was superior to PROGRESS (integrated discrimination improvement +0.036, p<0.001), similar to J-CTO and slightly inferior to CL score (– 0.011, p = 0.004). Regarding net reclassification improvement, CASTLE reclassified better than PROGRESS (overall continuous net reclassification improvement 0.379, p<0.001) in roughly 20% of cases. Conclusion Procedural percutaneous coronary intervention difficulty is not consistently depicted by available chronic total occlusion scores and is influenced by the characteristics of each chronic total occlusion cohort. In our study population, including expert and learning operators, the CASTLE score had slightly better overall performance along with CL score. However, we found only intermediate performance in the c-statistic predicting chronic total occlusion success among all scores.


Methods
Scores were calculated using raw data from 1,342 chronic total occlusion procedures included in REBECO Registry that includes learning and expert operators. Calibration, discrimination and reclassification were evaluated and compared.

Results
Mean score values were: CASTLE 1.60±1.10, J-CTO 2.15±1.24, PROGRESS 1.68±0.94 and CL 2.52±1.52 points. The overall percutaneous coronary intervention success rate was 77.8%. Calibration was good for CASTLE and CL, but not for J-CTO or PROGRESS scores. Discrimination: the area under the curve (AUC) of CASTLE (0.633) was significantly higher than PROGRESS (0.557) and similar to J-CTO (0.628) and CL (0.652). Reclassification: CASTLE, as assessed by integrated discrimination improvement, was superior to PROG-RESS (integrated discrimination improvement +0.036, p<0.001), similar to J-CTO and slightly inferior to CL score (-0.011, p = 0.004). Regarding net reclassification improvement, CASTLE reclassified better than PROGRESS (overall continuous net reclassification improvement 0.379, p<0.001) in roughly 20% of cases.

Conclusion
Procedural percutaneous coronary intervention difficulty is not consistently depicted by available chronic total occlusion scores and is influenced by the characteristics of each chronic total occlusion cohort. In our study population, including expert and learning operators, the CASTLE score had slightly better overall performance along with CL score. However, we found only intermediate performance in the c-statistic predicting chronic total occlusion success among all scores.

Background
The percutaneous coronary intervention (PCI) of a Chronic Total Occlusion (CTO) is currently one of the most complex procedures in interventional cardiology. Compared with non-CTO PCI, interventions in chronically occluded vessels take more time, toolbox resources, radiation exposure and risk of complications [1][2][3]. Therefore, the patients should have a comprehensive pre-procedural assessment, including symptoms, ischemia and viability testing, in order to make a straightforward clinical indication. Currently, patients are derived for CTO PCI to improve patient's symptoms, to reduce significant ischemia burden or to seek complete revascularization to improve left ventricular ejection fraction (LVEF) [2,4].
Once the clinical indication is established, the operator must have a realistic estimation of the probability of procedural success that will eventually be discussed with the patient and used for clinical decision-making. The procedural difficulty is largely dictated by structural characteristics of the CTO, the coronary anatomy and clinical factors. Several scores have been developed over the last ten years to integrate these variables and perform an objective assessment of procedural CTO difficulty. Following the Multicenter CTO registry in Japan (J-CTO) score [5], the Clinical and Lesion-related (CL) [6] and Prospective Global Registry for the Study of Chronic Total Occlusion Intervention (PROGRESS CTO) scores [7] were developed and tested in study populations. More recently, the CASTLE score has been proposed on the grounds of a number of reasons [8]. First, its derivation dataset is by far the largest (14,882 patients from the EuroCTO registry compared to 329, 1143 and 521 patients in J-CTO, CL and PROGRESS scores, respectively). Second, it is representative of a large number of different European centres and operators encompassing the wholes spectrum of CTO-PCI approaches. And third, it's the most recent and thus might reflect the impact of novel contemporary devices in CTO recanalization success.
In this study we aim to compare the performance of the new CASTLE score to the previous and representative J-CTO, CL and PROGRESS CTO scores using an extensive database of CTO-PCI procedures (the Iberian Registry of CTO PCI, or REBECO). In brief, the Association of Interventional Cardiology of the Spanish Society of Cardiology prompted an open initiative to gather prospective CTO PCI data across Spain. This ongoing Registry involves centres with a variety of expertise in CTO PCI with a pragmatic character intending to record every CTO attempt in daily practice into a real-world data-set.

Methods
The methodology, variable definitions and first results of the Iberian Registry of CTO PCI are discussed elsewhere [9]. For the present analysis (n = 1626 CTO cases), we used a data extraction from 24 centers taken August 31, 2019 (cases belong to the timeframe 2015-2019). From this database, we selected those cases with valid values on all critical variables used to estimate the four scores plus the variable CTO technical success (n = 1342 CTO cases). Technical success was defined as CTO recanalization with final TIMI 3 flow. Subsequently, the J-CTO, CL, PROGRESS and CASTLE scores were independently calculated from the raw registry data ( Table 1 summarizes the score definitions of each score taken from the original publications [5][6][7][8]). Each score was dichotomized in simple or complex cases for secondary analysis using cutoffs chosen from the clinical practice [10] (CASTLE <4 vs �4, J-CTO <3 vs �3, PROG-RESS CTO <3 vs �3, CL-SCORE <5.5 vs �5.5).
Qualitative variables were summarized by frequency distribution, and quantitative variables as mean values and SDs. Continuous, non-normally distributed variables were expressed as medians and interquartile ranges (IQR). Chi-Square linear p for trend was estimated for observed success rates across strata of the four scores. Hosmer-Lemeshow goodness-of-fit test, obtained by univariate logistic regression with the success rate as the dependent variable, was used to assess the calibration of the scores. Discrimination was analyzed with the area under the curve (AUC) of receiver-operator characteristics (ROC) curve. Comparison of the AUC, taking CASTLE AUC as a reference, from receiver-operating characteristic curve analysis was performed with the DeLong method [11] and p-values were corrected by Bonferroni method.
To assess discrimination and reclassification ability, each score was compared with the CAS-TLE score as a reference by absolute integrated discrimination improvement (IDI) index, as well as continuous net reclassification index (NRI) [12]. Sensitivity, specificity, positive and negative predictive values for technical success were calculated for the complex case cutoffs previously defined by expert consensus and for the highest Youden index values. Statistical analysis was performed with STATA version 15.0 and IBM SPSS Statistics Version 21.0 (IBM Corporation, Chicago, Illinois). A 2-tailed p-value of <0.05 was considered statistically significant.

Results
Clinical and interventional characteristics of the 1,342 patients included in the study are shown in Tables 2 & 3. Mean score for CASTLE was 1.60±1.10; J-CTO 2.15±1.24; PROGRESS 1.68±0.94 and CL 2.52±1.52. The overall success rate in the patients included in the study was 77.8%. Fig 1 shows the scoring distribution among the study sample compared to each score's original derivation cohort [5][6][7][8], except for CL score that provided only aggregated data. CL score original distribution was 33.07% score 0-1, 37.27% score 1.5-2.5, 25.55% score 3.5-4.5 and 4.11% score �5 compared with the Iberian Registry 25.6%, 36.2%, 30.5% and 7.7% in the same categories. The distributions show that the Iberian Registry cohort has lower complexity than CASTLE derivation cohort but higher than J CTO, PROGRESS or CL derivation cohorts. Fig 2 shows the predicted and observed success rates per each possible score's category (obtained by logistic regression analysis). All scores showed a trend towards an inverse linear relationship between procedural success rate with score values (p<0.001 in all scores), but in the upper range of predicted scores the actual success rate was higher than expected, irrespective of the employed score. On the contrary, lower score values overestimated the actual success rate. The CASTLE and CL scores were well-calibrated using the Hosmer-Lemeshow goodness-of-fit test (p>0.05), but not the PROGRESS and J-CTO scores (p<0.05). Note that although in the highest complexity strata of CASTLE and CL scores (5-6 and 7-8 points respectively) the success rates were higher than expected, they represent <1% of the study population (see Fig 1). The discrimination of the scores for procedural success was tested using the AUC of the ROC curve (Fig 3). AUC from PROGRESS was significantly lower than CASTLE AUC, but J-CTO and CL score AUC were not different from CASTLE AUC. However, the overall discriminatory capacity of all scores was limited (as a consensus AUC <0.7 is considered poor to moderate discrimination [13]). Comparing CASTLE to J-CTO, we found no differences in reclassification abilities as evaluated with IDI /NRI (Table 4); however, CASTLE was superior to PROGRESS (IDI +0.036, p<0.001 and NRI +0,379, p<0.001). Finally, CASTLE had slightly inferior IDI than CL score (-0.011, p = 0.004), but similar NRI (p = 0.31). Table 5. shows sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and Youden index values at "complex CTO" cutoffs (Table 5A). We also estimated these values at best Youden index value cutoffs (Table 5B)

Discussion
This study provides a comprehensive comparison of the new CASTLE score against the most commonly used CTO scores in an extensive, national database of CTO procedures, including expert and learning CTO operators. However, we found poor-to-intermediate performance in the c-statistic predicting CTO success among all scores. With small differences, CASTLE score performed best along with CL score, followed by J-CTO and PROGRESS with slightly worse efficiency. An interesting lesson of this study is that applying different scores to the same cohort, the spectrum of difficulty is variable and not the same as in the original CTO scores cohorts (Fig 1). In other words, there seems to be a lack of consistency among them. How, then, a CTO score should be chosen? The probability of success in CTO PCI is dependent on multiple factors. On the one hand, on the operator's experience and the availability and use of an extensive toolbox. On the other, it relies on the use of a structured approach, in which the use of CTO PCI scores plays an essential role [2,14]. First, they allow the operator to gauge the feasibility of procedural success according to his/her level of expertise, particularly over the learning curve of CTO PCI. Second, they facilitate Heart Team discussions in cases in which CTO lesions are critical targets in an achieving equivalent degree of myocardial revascularization. And third, they provide insights on the procedural time, amount of contrast, radiation dose and risk of complications associated with the intervention that can be integrated into the decision-making process and in PCI planning [14].
CASTLE is the CTO score derived from the largest dataset (14,882 patients taken from 2008 to 2014), encompassing a broader number of operators, techniques and practices across Europe. Some relevant differences compared to previous scores must be pointed out. The

PLOS ONE
Choice of a CTO score J-CTO was derived from a multi-centre Japanese database comprising 400 procedures (2006)(2007), and designed to estimate the likelihood of passing an antegrade guidewire in less than 30 minutes [5]. Posteriorly, it was extensively validated as success and even outcomes predictor [15,16]. The CL score was designed to assess procedural failure in a first CTO-PCI procedure, including for the first-time clinical variables. However, it was derived from a single-centre European cohort of mainly (90.7%) antegrade CTO cases (n = 1,671, from 2004 to 2013) [6]. PROGRESS CTO was derived from a multi-centre US database, is more contemporary to CASTLE (2012CASTLE ( -2015, and designed to assess technical success using the hybrid approach [7]. These three were chosen as comparators because they are the most commonly used scores and represent different regional approaches. The heterogeneous derivation cohorts probably explain that the different scores include a heterogeneous set of variables, except for the blunt proximal cap (Table 1). Two scores include clinical variables. They concur in CABG, which is acknowledged as an adverse feature in CTO patients [17,18]. Moreover, a recent study from the PROGRESS database found 5% less recanalization success in CABG patients compared to non-CABG patients [19]. CTO characteristics such as tortuosity and calcification are more challenging to be consistently evaluated, as they are defined discordantly among scores and might be somewhat subjective to the operator. However, we agree that differently to the softer J-CTO had definitions [5], severe tortuosity and severe calcification are common ground, especially in combination, for challenging CTO scenarios. Finally, it is remarkable that only one score (PROGRESS) includes assessing collaterals in the scoring, which is very relevant for CTO planning. Despite these differences, we genuinely believe that the careful CTO evaluation needed to calculate one or more scores is valuable for procedural planning, especially for the less experienced operator.
Comparing our cohort with the derivation cohorts of previous scores we see a shift towards higher complexity in the REBECO Registry compared to older J CTO, CL and PROGRESS scores (Our cohort starts at year 2015 while J CTO, CL and PROGRESS have older derivation cohorts [5][6][7]). This is probably as a result of widespread standardization of techniques and equipment allowing CTO operators to tackle more complex cases [2]. In contrast, the pragmatic nature of the Iberian Registry includes centres and operators at different points of the CTO learning curve, quite differently from, for example, the CASTLE data derived from the highly expert EuroCTO Club [8]. Consequently (and saving the differences in endpoints), we had less recanalization success (77.8%) compared to 84.2% in CASTLE, 92.5% in PROGRESS and 88.6% in J CTO; but better than the CL cohort (72.5%) [5][6][7][8].
Many of the newly developed scores compared themselves in the original reports to J-CTO, showing better parameters of calibration and discrimination [5][6][7][8]. However, this calculation might be biased because the validation and derivation cohort come together from a single "mother" cohort that is likely to perform better with its derived score than with an external score (J-CTO). Some score comparisons have been previously published showing that the performance of the scores might be similar. Karastakis et al. compared CL, J-CTO and PROG-RESS scores in a cohort from the PROGRESS CTO registry (n = 664), showing similarly poor to moderate (<0.7) discrimination as evaluated per AUC (with no inter-score differences) [20]. This analysis might be biased because the study sample also comes from the PROGRESS CTO Registry. Recently, Kalogeropoulos et al. used an international database (n = 660) to compare CASTLE to J-CTO, finding equal overall discriminatory capacity of both scores (AUC 0.676 and 0.698 respectively). CASTLE outperformed J-CTO in the most complex cases (J-CTO �3 or CASTLE �4 representing only roughly 9% of the sample) but with quite low overall AUC (0.588). Our comparison study has the strengths of a more extensive, independent testing database (n = 1,342) and a more comprehensive analysis using four scores and taking CASTLE score as a reference.
In our study, the calibration (meaning how close the observed and expected results were) was better for CASTLE and CL scores, although this test does not allow for inter-score comparisons. However, the differences are rather small (Fig 2). The discrimination measured with AUC and with the IDI index was better for CASTLE, J-CTO and CL and slightly worse for PROGRESS score; however poor to moderate (<0.7) in absolute terms, in agreement with previous publications [10,20]. Complementary to the AUC that has some limitations [21], we assessed the incremental value of the newer CASTLE score using two reclassification indexes, the Integrated Discrimination Improvement (IDI) and the Net Reclassification Improvement (NRI) indices [12,22]. The IDI, which is possibly more sensitive than the AUC comparison showed that CASTLE had better discrimination than PROGRESS, similar to J-CTO and slightly inferior than CL. The NRI analysis showed that CASTLE reclassified cases better than PROGRESS in roughly 20% of cases (CASTLE correctly reclassified 17.82% of event cases and 20.13% of nonevent cases into a higher or lower predicted risk of success, respectively). However, CASTLE did not significantly improve reclassification compared to J-CTO and CL scores. The intermediate or poor performance of current available CTO scores in predicting CTO success suggests the need for more precise mechanisms to predict the outcomes and precisely inform our patients.
We provided in Table 4 data on sensitivity, specificity, PPV, NPV and Youden indexes showing only modest values with commonly used complex CTO cutoffs and cutoffs that maximizes the Youden index. However, we believe that binary categorization is unnecessary in CTO scores because it masks the spectrum of complexity provided by scores. Furthermore, PPV and NPV provide insufficient precision to inform for or against attempting a specific CTO case (highly expert operators demonstrated high J-CTO scores being non-associated with observed success rates [23]).
The information provided by our study may also help in selecting a specific CTO PCI score for particular purposes. Very experienced operators with success rates over 90% will take little interest on the success discrimination capabilities but might use CASTLE or CL in order to discuss efficiency and complication risks with patients (both have the advantage of combining angiographic and clinical variables; CASTLE is probably more intuitive to calculate and has fewer categories than CL score). Less experienced operators might choose CASTLE or CL scores on the grounds of better calibration and discrimination to predict which highly complex cases might benefit for proctoring or referral (although we must bear in mind that in this or any other study the overall discrimination is poor to moderate). Also, the predicted success rates might be easily obtained with univariate logistic regression analysis in a local database, using one or more scores to choose the one with the best "personalized" calibration. For research and benchmarking, J-CTO is the oldest and most widespread score and thus is critical to allow comparison with earlier studies. CASTLE is without a doubt the score with more solid foundations in terms of contemporaneity and derivation dataset size, so it will be probably a reference for future publications.
A few limitations should be reported regarding this study. First, the original angiographies were not assessed by a central core lab; we trusted in the individual investigator's evaluation of each item contributing to the different scores. Second, the collected data is self-reported and not systematically audited, so in spite of quality control some degree of selection and reporting bias are possible. Third, although this is a multi-centre, contemporary database, it might not be representative of specific practices, strategies or skillsets. Finally, more scores might be considered for comparison, although as discussed before, we found these as the most commonly used scores.
Several available CTO scores were not assessed in our study on the grounds of the preferential use of a specific device or strategy (CrossBoss and hybrid techniques in Europe, RECHARGE registry [24]); derivation from a single operator's experience (ORA score [25]); or the need for interesting but non-mandatory methods in CTO assessment (CT-RECTOR [26] or KCCT [27] scores). However, many predictors of procedural failure are common: stump, calcification, tortuosity, length, previous CABG.

Conclusion
We compared 4 CTO recanalization success scores in a large, independent, multicenter database. Overall discrimination was poor to moderate with c-statistics predicting CTO success below 0.7 among all scores. CASTLE score performed best along with CL score, followed by J-CTO and PROGRESS with slightly worse efficiency.
Procedural PCI difficulty is not consistently depicted by available CTO scores and is probably influenced by the characteristics of each CTO cohort. Operators in different points of their learning curve should be aware and consider the choice of an adequate score for a specific purpose. In the case of our study population, including expert and learning operators, the CAS-TLE score had slightly better overall performance along with CL score.