Potential value and limitations of different clinical scoring systems in the assessment of short- and long-term outcome following orthotopic liver transplantation

Background In an attempt to further improve liver allograft utilization and outcome in orthotopic liver transplantation (OLT), a variety of clinical scoring systems have been developed. Here we aimed to comparatively investigate the association of the Balance-of-Risk (BAR), Survival-Outcomes-Following-Liver-Transplant (SOFT), Preallocation-Survival-Outcomes-Following-Liver-Transplant (pSOFT), Donor-Risk-Index (DRI), and the Eurotransplant-Donor-Risk-Index (ET-DRI) scores with short- and long-term outcome following OLT. Methods We included 338 consecutive patients, who underwent OLT in our institution between May 2010 and November 2017. For each prognostic model, the optimal cutoff values were determined with the help of the Youden-index and their diagnostic accuracy for 90-day post OLT-mortality and major postoperative complications was measured by the area under the receiver operating characteristic curve (AUROC). Patient- and graft survival were analyzed using the Kaplan-Meier method and the log-rank test. Morbidity was assessed using the Clavien-Dindo classification and the Comprehensive-Complication-Index. Results BAR, SOFT, and pSOFT performed well above the conventional AUROC-threshold of 0.70 with good prediction of early mortality. Only BAR showed AUC>0.70 for both mortality and major morbidity. With the cutoffs of 14, 31, and 22 respectively for BAR, SOFT, and pSOFT, subgroup analysis showed significant differences (p<0.001) in morbidity and mortality, length of intensive care- and hospital-stay and early allograft dysfunction rates. Five-years patient survival was inferior in the high BAR, pSOFT, and SOFT groups. Conclusions Out of all scores tested, the BAR-score had the best value in predicting both 90-day morbidity and mortality after OLT showing the highest AUCs. The pSOFT and SOFT scores demonstrated an acceptable accuracy in predicting 90-day morbidity and mortality. The used BAR, SOFT, and pSOFT cutoffs allowed the identification of patients at risk in terms of five-year patient survival. The DRI and ET-DRI scores have failed to predict recipient outcomes in the present setting.


Introduction
Over the past 60 years, orthotopic liver transplantation (OLT) has evolved as the standard treatment for patients with end-stage liver disease and acute liver failure [1,2]. While surgical techniques, organ preservation, intensive care management and immunosuppression have significantly improved during this time [3], the gap between supply and demand for liver allografts continues to increase. Several strategies, such as living donation, splitting of cadaveric grafts for two recipients and transplantation of extended criteria donor allografts (ECD) have been implemented to expand the donor pool [4][5][6].
To improve transparency and to promote fair allocation of allografts, the former centerbased allocation policy was in 2006 replaced by the MELD allocation system (Model for End-Stage Liver Disease), that became mandatory for all participating centers within the Eurotransplant network. This "sickest-patient first policy" led to the current situation, where many recipients suffer from an advanced liver dysfunction and a poor general condition at the time of OLT [7]. In addition, more than 50% of potential donor allografts exhibit further risk factors, some of which include advanced donor age, expressed graft macrosteatosis and/or extended cold ischemic time (CIT) [8][9][10].
While the MELD-score is an accurate and well-documented 3-month mortality predictor for patients on the waiting list, it is not considered to be a suitable prediction tool for recipient outcomes following OLT [11,12]. Therefore, instead of relying exclusively on the expertise and subjective assessment of the transplant surgeon, an objective, accurate and feasible prediction model of postoperative outcome ahead of the OLT procedure would facilitate liver allograft allocation to the most suitable recipients. To this aim, various clinical scoring systems, using donor and recipient factors, have been developed over the past 10-15 years [13][14][15][16][17][18].
The goal of our present study is to assess the performance of these scoring systems, i.e. (Balance-of-Risk (BAR), Survival-Outcomes-Following-Liver-Transplant (SOFT), Preallocation-Survival-Outcomes-Following-Liver-Transplant (pSOFT), Donor-Risk-Index (DRI), and the Eurotransplant-Donor-Risk-Index (ET-DRI)) in predicting short-and long-term outcome in patients underwent OLT at our institution.
Hospital RWTH Aachen (UH-RWTH), were included in this study. Patients with study relevant missing data where calculation of one or more scores was not possible and living related transplantations were excluded (n = 10).
Laboratory MELD (labMELD) was used in all instances and exceptional MELD points were not considered. The study was conducted at the UH-RWTH in accordance with the requirements of the Institutional Review Board of the RWTH Aachen University (EK-047/18), the current version of the Declaration of Helsinki as well as the Declaration of Istanbul and the good clinical practice guidelines (ICH-GCP). Informed consent was waived due to the retrospective study design and collection of readily available clinical data. Recipient and donor characteristics are shown in Tables 1 and 2.

Data collection and follow-up
Data were obtained from a prospectively maintained institutional database and analyzed retrospectively. Pre-transplant labMELD, DRI, ET-DRI, SOFT, pSOFT and BAR score were calculated as described below. Extended criteria donor allografts were defined according to the definition of the German Medical Chamber (donor age>65-years, ICU with mechanical  [19]. To assess post-transplant early allograft dysfunction (EAD) the Olthoff criteria were adopted [20]. Postoperative morbidity was evaluated for all surgical complications registered during the first 90-days following OLT according to the Clavien-Dindo classification (CD) and the Comprehensive Complication Index (CCI) [21,22]. Postoperative transfusions were defined as blood products given within the first 7 days following OLT. Blood products administered later during the postoperative course and within the first 90-days were assessed among the postoperative complications. Length of ICUstay represents the initial stay after the OLT-procedure until the transfer of the patient to a standard care transplantation ward. Readmission to ICU was assessed as part of the total hospital stay. Hospital stay was defined by the date of admission for OLT and the day of discharge from the UH-RWTH. Each patient was assessed regularly by the referring hepatologist or the local outpatient clinic. The follow-up examinations included a clinical examination, standard blood test with follow-up tumor markers and cross-sectional imaging if applicable.

Score models calculated
The analyzed prediction models (DRI, ET-DRI, SOFT, pSOFT and BAR) were calculated as described before [13,16,17,23]. Further details on the calculation of the used scores are available as supporting information (S1 Appendix). Local allocation was defined as the procurement area of the UH-RWTH, whereas the rest of Germany was regarded as a regional allocation. The rest of the Eurotransplant region was considered as national or extra-regional sources, depending on the calculated prediction model [13,14].

Study endpoints and statistical analysis
Ninety-day mortality following OLT was chosen as the primary endpoint for the assessment of the predictive abilities of the various scores. As secondary endpoints, 90-day morbidity and 5-year graft-and patient survival were analyzed. The discriminative ability of the various score models for the prediction of 90-day-survival and major complications (CD�3b) was compared using the receiver operating characteristic (ROC) curve analysis calculating the area under the receiver operating characteristic curve (AUROC). The respective cutoff values of the potential prognostic models were selected with the help of the best Youden-index. The Hosmer-Lemeshow chi 2 goodness-of-fit test was applied to test model suitability. For analysis of categorical data, the Chi-square test and the Fisher's exact test, for comparison of continuous variables the Mann-Whitney U test were applied. The prognostic value of the various clinical scores was demonstrated in the subgroup comparisons using the odds-ratios and 95% confidence intervals from the univariable logistic regression analysis. The Spearman correlation coefficient was used to express further potential association between the assessed scores and various clinical outcome measures. To visualize patient and graft survival the Kaplan-Meier method was used. Survival data was analyzed with the log-rank test. All p-values<0.05 were considered statistically significant.

Recipient and donor characteristics
After applying the above-mentioned inclusion-and exclusion criteria, 328 out of 338 consecutive OLTs were included in the analysis. The mean age of all the recipients was 54±11 years. Some 221 (67%) recipients were male and 107 (33%) were female. The mean pre-transplant laboratory MELD-score was 20±11. The most common indications for OLT were alcoholic cirrhosis (24% (78/328)) and hepatocellular carcinoma (23% (75/328)). Recipient characteristics and indications leading to listing for OLT are summarized in Table 1.

Impact of the BAR, pSOFT, SOFT, DRI, and ET-DRI and their determined cutoffs on postoperative morbidity and mortality
The mean values and standard deviations for the different scores were DRI 1.77±0.34, ET-DRI 1.93±1.68, SOFT score 16±12, pSOFT score 12±11 and for BAR score 9±11 (Tables 1 and 2).
The areas under the receiver operating curve (AUROC) for the prediction of 90-day mortality were 0.847 for the BAR (CI 0.761-0.934; p<0.001), 0.837 for the SOFT (CI 0.736-0.939; p<0.001) and 0.821 for the pSOFT-scores (CI 0.714-0.928; p<0.001). The DRI and ET-DRI revealed AUROCs of 0.608 and 0.572 respectively. For the prediction of major complications (CD�3b), AUROC for the BAR score was 0.709 (CI 0.654-0.765; p<0.001), for SOFT and pSOFT 0.680 and 0.661 (CI 0.623-0.738 and 0.602-0.720; each p<0.001) respectively. The DRI and ET-DRI showed a c-statistic<0.6 (0.535 and 0.555; p = 0.472 and p = 0.492, respectively) ( Table 4). The goodness-of-fit testing, calculated for 90-day mortality and major morbidity, revealed a satisfactory model fit for each of the used scores ( Table 4). The optimal score cutoff values were determined by the Youden-index and are shown in Table 5.
Next, we analyzed the ability of the different scores to stratify our patient cohort into highand low-risk groups based on morbidity and mortality. As shown in Table 6, the subgroups of patients over the defined cutoff score values had significantly increased rates of major complications, CCI, 90-day mortality, and longer ICU-and hospital stay in case of the BAR-, pSOFT, and SOFT scores. Only the defined pSOFT cutoff was able to stratify patients concerning a higher incidence of EAD. In case of DRI and ET-DRI no significant differences were found ( Table 6). The association between perioperative outcome and the calculated values of the various scores were assessed further using the Spearman's correlation coefficient. A moderately strong but significant positive association was observed between the BAR, pSOFT and SOFT score values and the days spent on ICU (BAR: r = 0.523 p<0.001; pSOFT: r = 0.511 p<0.001; SOFT: r = 0.502 p<0.001), length of hospital stay (BAR: r = 0.487 p<0.001; pSOFT: r = 0.534 p<0.001; SOFT: r = 0.532 p<0.001) as well as the cumulative 90-day CCI (BAR: r = 0.469 p<0.001; pSOFT: r = 0.434 p<0.001; SOFT: r = 0.441 p<0.001) ( Table 7). No meaningful correlation was found between the above-mentioned outcome measures and the DRI and ET-DRI scores ( Table 7).

Discussion
An effective utilization of the existing organ donor pool with an optimal graft and recipient matching are of utmost clinical importance in solid organ transplantation and are currently primarily based on subjective clinical evaluation of the transplant surgeon. An objective riskassessment tool that is able to reliably predict post-OLT outcomes is urgently needed to establish an objective and a more transparent allocation. Even though several prediction tools have been developed, none of them has found its way into the clinical routine yet. Based on this, we aimed to comparatively assess the predictive value of five differential clinical prediction tools (BAR, pSOFT, SOFT, DRI, ET-DRI) in the context of 90-day mortality/morbidity and 5-year graft-and patient survival in adult recipients of OLT. Following its initial development, subsequent studies validated the DRI as a potential independent predictor of allograft failure in different MELD categories in the post-MELD era [8,24,25]. The DRI, which was formulated in the pre-MELD era in 2005, showed a c-statistic ranging from 0.500 to 0.650 in separate studies suggesting an already rather low association with outcome [13,26]. Our own findings showed a comparably low AUROC of 0.608 for the prediction of 90-day mortality. These findings are likely attributed to the well-known shortcomings of the DRI such as the validation in the pre-MELD era, and the disregard of relevant recipient risk factors. Accordingly, in a survey performed by Mataya et al. on the value of the DRI in clinical decision making, 73% of the respondents believed that the DRI is not a feasible tool to predict morbidity and graft failure following OLT. Moreover, 88% even stated that there are misleading aspects accompanied with the index, such as its poor predictive ability, inclusion of irrelevant factors (e.g. ethnicity) and the omission of relevant factors (e.g. recipient factors and further important donor factors such as graft steatosis or vasopressor support) [27]. In the recent years, the DRI was adapted to the Eurotransplant setting by Braat et al., replacing the risk factors "ethnicity" and "height" with the parameters "latest GGT" und "rescue offer". Braat et al. claimed that the ET-DRI may be a useful tool for liver allocation in the future [23]. However, with a c-statistic of 0.624 (overall graft survival), it appears to be a predictor of only limited utility. Later studies claimed an even lower AUROC of 0.480-0.520 [28]. While Schoening et al. found a significant value of ET-DRI for specific subgroups [29], our own findings showed a disappointing AUROC of 0.572 for 90-day mortality with ET-DRI presenting a limited impact in the prediction of early outcome following OLT in our cohort. This is in line with a study of Reichert et al., who found an AUROC of 0.477 for three months mortality and 0.524 for three months graft survival in their European cohort [30]. Overall the DRI and ET-DRI performed well below the conventional AUROC threshold of 0.700 in this as well The value of liver transplantation scoring systems in risk-assessment  [15,16], our analysis showed a promising AUROC of 0.837 for the SOFT-score for the prediction of 90-day post-transplant mortality. The pSOFT-score, which utilizes 14 recipient risk factors and was developed to weigh the expected mortality risk prior to transplant versus the risk without transplantation, showed an AUROC of 0.821 in our cohort. Since an AUROC between 0.8 and 0.9 represents an excellent diagnostic accuracy, it seems that the SOFT-and the pSOFT-scores are suitable and attractive tools to predict 90-day mortality. Nevertheless, in case of the SOFT/ pSOFT scores, the inclusion of multiple variables, some of them being partially subjective and only semi-quantitative (e.g. encephalopathy, ascites), and a complex statistical modeling impair the practical applicability for prompt clinical assessment and decision-making prior transplantation. With an AUROC of 0.680 and 0.661, respectively, SOFT and pSOFT displayed a limited value for the prediction of 3-month major morbidity. This finding is in accordance with Schlegel et al. who found a c-statistic of 0.605 for the prediction of 3-month morbidity (CD>3a) for the SOFT score in a selected population of high MELD-recipients (MELD score�30) [18].
The BAR score constitutes a promising novel tool developed by Dutkowski et al. which evaluates not only donor-but also easily accessible recipient risk factors. In our cohort, the BAR score was the only measure to predict 90-day morbidity with a reasonable accuracy (AUROC>0.7), hence it seems to be suitable to stratify patients based on both 90-day mortality and morbidity. While other authors found AUROCs below 0.7 for the prediction on 90-day mortality using the BAR-score [31,32], in the present study ROC analysis revealed a convincing AUROC of 0.847 for the prediction of 90-day mortality and a solid AUROC of 0.709 for major morbidity. This is in line with the findings of Schlegel et al. who reported a related c-statistic of 0.754 for severe complications (CD�3b) and 0.734 for 90-day mortality in case of the BAR-score [18]. The robust nature of the BAR score in predicting outcomes has been confirmed in other cohorts including pediatric/adolescent patients as well as in recipients of living donor liver transplantation [33,34].
In our subsequent analysis, the subgroups of patients with high BAR, pSOFT, and SOFT scores performed significantly worse in terms of almost all assessed short-term outcome measures including major complications, cumulative CCI scores, 90-day mortality, and the length of ICU-and hospital stay. Seemingly pSOFT demonstrated a potential benefit over the BAR and SOFT scores in stratifying patients at risk based on the incidence of EAD, however, further studies are needed to confirm this finding. As expected from the general performance of the DRI and ET-DRI scores in the ROC-analysis, the used DRI and ET-DRI cutoffs failed to stratify the patients into a low-and high-risk groups with regards to 90-day morbidity and mortality.
The association between early morbidity and mortality and the BAR-, SOFT-, and pSOFTscores were further supported by their significant correlation with the length of ICU-stay, days of in-hospital care and 90-day CCI values. None of these factors showed a significant association neither with the DRI nor with the ET-DRI scores.
Although the used cutoff values for the BAR-, pSOFT-, SOFT-, DRI-, and ET-DRI-scores were not optimized for long-term survival, the clinical value of the BAR-, pSOFT, and SOFT scores was strengthened by their significant association with 5-year patient-(and graft) survival (Fig 1). The value of liver transplantation scoring systems in risk-assessment Of note, all five investigated scores have been criticized for lacking certain well-recognized donor risk factors such as the presence and severity of graft steatosis. The lack of clear guidelines on a standardized biopsy-harvesting approach within Eurotransplant and the non-standardized semi-quantitative pathological assessment of steatosis (macro-versus microvesicular steatosis) constitute significant barriers in the incorporation of this important risk factor into prognostic OLT models. Moreover, other important risk factors such as CIT are sometimes difficult to estimate [13].
The interpretation of our findings is certainly limited by the sample size and the retrospective nature of our single-center assessment. Notwithstanding these limitations, this report is one of the first comprehensive studies assessing and comparing the value and limitations of five different clinical outcome scoring systems for OLT, demonstrating the potential value of the BAR-, SOFT-, and pSOFT scores and an inferior performance of the DRI and ET-DRI scores. It should be noted that in the present study only the BAR score was able to predict 90-day major morbidity and mortality with a high accuracy (AUROC>0.700). Based on its excellent value in predicting both 90-day mortality and major morbidity and the very easy and feasible calculation, the BAR-score might become a useful tool in the German allocation system to predict postoperative outcomes. Based on the promising results observed with the SOFT/pSOFT scores as well, future studies should evaluate the clinical feasibility of these complex scores and any potential benefits compared to the BAR score in various patient cohorts using patient-and graft survival as primary endpoints. Despite these encouraging results, validation in a prospective multicenter setting is warranted before implementing any of these prognostic tools into the routine clinical practice.