Can transplant renal scintigraphy predict the duration of delayed graft function? A dual center retrospective study

Introduction This study focused on the value of quantitatively analyzed and qualitatively graded renal scintigraphy in relation to the expected duration of delayed graft function after kidney transplantation. A more reliable prediction of delayed graft function duration may result in a more tailored and patient-specific treatment regimen post-transplantation. Methods From 2000 to 2014, patients with early transplant dysfunction and a Tc-99m MAG3 renal scintigraphy, within 3 days post-transplantation, were included in a dual center retrospective study. Time-activity curves of renal scintigraphy procedures were qualitatively graded and various quantitative indices (R20/3, TFS, cTER, MUC10) were combined with a new index (Average upslope). The delayed graft function duration was defined as the number of days of dialysis-based/functional delayed graft function. Results A total of 377 patients were included, with a mean age (± SD) of 52 ± 14 years, and 58% were male. A total of 274 (73%) patients experienced delayed graft function≥ 7 days. Qualitative grading for the prediction of delayed graft function≥ 7 days had a sensitivity and specificity of respectively 87% and 65%. The quantitative indices with the most optimal results were cTER (76% sensitivity, 72% specificity), and Average upslope (75% sensitivity, 73% specificity). Conclusions Qualitative renal scintigraphy grading and the quantitative indices cTER and Average upslope predict delayed graft function ≥ 7 days with a high sensitivity. This finding may help to support both clinicians and patients in managing early post-operative expectations. However, the specificity is limited and thus renal scintigraphy does not reliably help to identify patients in whom the course of delayed graft function is longer than anticipated.


Introduction
This study focused on the value of quantitatively analyzed and qualitatively graded renal scintigraphy in relation to the expected duration of delayed graft function after kidney transplantation. A more reliable prediction of delayed graft function duration may result in a more tailored and patient-specific treatment regimen post-transplantation.

Methods
From 2000 to 2014, patients with early transplant dysfunction and a Tc-99m MAG3 renal scintigraphy, within 3 days post-transplantation, were included in a dual center retrospective study. Time-activity curves of renal scintigraphy procedures were qualitatively graded and various quantitative indices (R20/3, TFS, cTER, MUC10) were combined with a new index (Average upslope). The delayed graft function duration was defined as the number of days of dialysis-based/functional delayed graft function.

Results
A total of 377 patients were included, with a mean age (± SD) of 52 ± 14 years, and 58% were male. A total of 274 (73%) patients experienced delayed graft function! 7 days. Qualitative grading for the prediction of delayed graft function! 7 days had a sensitivity and specificity of respectively 87% and 65%. The quantitative indices with the most optimal results were cTER (76% sensitivity, 72% specificity), and Average upslope (75% sensitivity, 73% specificity). a1111111111 a1111111111 a1111111111 a1111111111 a1111111111

Introduction
Monitoring renal function after kidney transplantation (KTX) is pivotal to recognize posttransplant complications, such as vascular or urological complications, acute tubular necrosis (ATN) or rejection. [1] Therefore, an imaging modality with a high predictive value for post-KTX complications may result in a more tailored and patient-specific treatment regimen and possible reduction of early graft failure.
Delayed graft function (DGF) is a common complication after KTX, with an overall incidence between 2% to 50%, depending on the type of kidney graft (donation after circulatory death (DCD), donation after brain death (DBD) or after living donation) and the definition used for DGF. [2] The most commonly used definitions for DGF are (1) dialysis-based DGF, the need for postoperative dialysis within the first 7 days after KTX and (2) functional DGF, defined by a serum creatinine level failing to decrease with 10% on 3 consecutive days following KTX. [3,4] The impact of DGF on the short-term outcome after KTX is a longer duration of hospitalization, increased risk of graft loss and a higher mortality. [5][6][7] Renal scintigraphy (RS) is a common and widely used test to assess graft function and complications after KTX. Previous studies introduced RS as a tool for the evaluation and/ or prediction of DGF. [8][9][10][11] However, correct acquisition and interpretation is difficult, since several different radiopharmaceuticals and multiple quantitative indices are available. [12] In this study we focus on Technetium-99m mercaptoacetyltriglycine (MAG3) RS, the most commonly used radiopharmaceutical for post-KTX evaluation. [13] We evaluate the qualitative grading scale of Heaf and Iversen and the following four quantitative indices: the ratio of uptake at 20 and 3 minutes (R20/3), the tubular function slope (TFS; counts/second), the corrected tubular extraction rate (cTER; mL/min/1.73 m2) and the uptake within the first 10 minutes, as a fraction of the injected dose (MUC10; counts/second/MBq), which showed to correlate significantly with post-KTX outcomes, in studies with smaller cohorts. [8,10,14,15] Earlier studies showed that RS cannot differentiate reliably between rejection and acute tubulus necrosis (ATN), as a cause of DGF. [16,17] We postulated that RS is able to predict the expected time course of recovery from DGF and may provide useful information to identify patients with additional allograft pathology. Improving the predictive value and clinical applicability of qualitative and quantitative RS indices could result in improved identification of the cause and expected duration of early graft dysfunction. This may facilitate to take image guided treatment decisions, subsequently leading to a reduction in the number of diagnostic biopsies and faster treatment.
We performed a dual center retrospective study to evaluate the prognostic performance of RS, with regard to the duration of DGF after KTX, by using qualitative and quantitative analysis methodology.

Patients and methods
Patients were included in the University Medical Center Groningen (UMCG) from 2000 to 2014 (n = 177) and in Leiden University Medical Center (LUMC) from 2011 to 2014 (n = 200). All available post-KTX RS procedures performed within 3 days after KTX were included and patients' charts were screened for baseline characteristics and renal function within the first year, using the electronic hospital registries. At the UMCG cohort, RS was performed when transplant dysfunction was clinically suspected. In the LUMC cohort, RS was performed according to the standard post-KTX protocol, including cases with suspicion of vascular or urological complications and with ongoing DGF. Primary outcome was the duration of DGF, analyzed as a continuous and dichotomous variable. For dichotomous analyses, patients of both centers were divided into two groups based on graft function: (I) DGF, dialysis-based delayed graft function or functional delayed graft function for ! 7 days (n = 274); (II) non-DGF, no dialysis-based delayed graft function or functional delayed graft function for ! 7 days (n = 103). In both centers kidney biopsies were performed per protocol if DGF persisted for more than 7-10 days without sign of improvement. As acute rejection is generally not thought to subside spontaneously it seems improbable that a significant number of rejections were missed with this approach. Patient data were processed and electronically stored according to the declaration of Helsinki Ethical principles for medical research involving human subjects. The local Medical Ethical Committees of both participating medical centers gave approval for this study and waived the need for written All Tc-99m MAG3 RS procedures were performed using an intravenous administration dose in a range of 80-100 MBq (2.16-2.70 mCi) and results were reanalyzed for the purpose of this study. Dynamic images were acquired with 1-second frames for two minutes (perfusion phase) followed by 20 seconds frames for 28 minutes (clearance phase). Reconstructed RS data were processed using Syngo.via (Siemens Medical Systems, TN, U.S.A.) and regions of interest (ROIs) were drawn manually. Correction for patients' motions was applied when needed. The renal ROI was drawn proximal to the renal cortex and the background ROI was drawn as a crescent shape, opposite of the renal artery and vein, resulting in a background corrected timeactivity curve, from start of the procedure up to 20 minutes. Patients were excluded if RS procedures could not be reanalyzed since re-analysis of the original data, to derive all quantitative indices, was necessary for this study.
Time-activity curves were qualitatively graded using the adjusted qualitative grading scale of Heaf and Iversen (Fig 1). This qualitative grading scale divides RS curves based on uptake and excretion curve shapes. [14] For this study we combined the first two and final two grades of the qualitative grading scale of Heaf and Iversen, since the difference between these grades was small with an equal distribution of DGF and non-DGF patients. This resulted in the following grades: Grade 1, a normal RS curve, with fast uptake and excretion; Grade 2, normal uptake with flat excretion curve; Grade 3, rising curve without excretion phase; Grade 4, reduced absolute uptake without excretion phase.
Quantitative assessment of RS curves was performed with the four most common indices R20/3, TFS, cTER, MUC10, including a newly developed index Average upslope (Fig 2). The R20/3 index is a quantitative relationship of retention to uptake ratio, which can be calculated by dividing the Tc-99m MAG3 uptake at 20 minutes by the uptake at 3 minutes. [18,19,20] The tubular function slope (TFS) reflects the Tc-99m MAG3 uptake by renal tubular cells during the first minutes after Tc-99m MAG3 injection, presented as the linear fit of the curve between Average upslope, the slope between counts at 20 seconds and counts at 3 minutes. MUC10 resembles the uptake within 10 minutes as a fraction of the injected dose. R20/ 3, retention to uptake ratio, dividing uptake at 20 minutes by the uptake at 3 minutes. 50 and 110 seconds. [8,11] The corrected tubular extraction rate (cTER), in mL/min/1.73m 2 , reflects the Tc-99m MAG3 uptake corrected for the body surface using the renal uptake rate for 1 to 2 minutes after injection and the following regression equation: 9.825X + 11.258 (wherein X is the renal uptake). [15] MUC10 is an index describing the Tc-99m MAG3 uptake, within the first 10 minutes, as a fraction of the injected dose. [10] Since we hypothesized that the upslope of the RS curve is most influenced by the early graft function, we introduced a new quantitative index, Average upslope, reflecting fast rise of the RS curve. The upslope of the RS curve ended between 2 and 3 minutes in the current cohort of Tc-99m MAG3 procedures and this resulted in the following formula: (counts at 3 minutes-counts at 20 seconds) / 160 seconds).

Statistical analysis
Baseline characteristics and clinical follow-up results are presented as mean and standard deviation (SD) when normal distribution was assumed by means of a Q-Q plot or histogram, as median and interquartile range (IQR) for skewed data and as frequency and percentage when data were categorical. We compared baseline characteristics of both centers, by performing student-t test or chi-square test. We investigated the correlation of qualitative grades and quantitative indices and duration of DGF, by performing a Spearman coefficient of correlation Test. To assess the predictive value of the adjusted qualitative grading scale of Heaf and Iversen, we calculated the number of patients with DGF for each grade and the sensitivity / specificity, and positive / negative predictive value. To assess differences in distribution of the quantitative indices between the DGF patients and the non-DGF patients we performed a Mann-Whitney U test for each index (two-sided p-value < 0.05 = significant), since the quantitative indices were not normally distributed. A receiver-operating characteristic (ROC) curve analysis was used for each significant quantitative parameter, considering an AUC between 0.5-0.7 as a poor level of discrimination, between 0.7-0.8 as an acceptable level of discrimination, and above 0.8 as a reliable level of discrimination. [21] The highest value of the Youden-Index (YI) (sensitivity + specificity-1) was considered as the optimal cut-off point (YI of 1 represents a perfect diagnostic test and YI of 0 indicates that the diagnostic test is not effective). [22] The Kaplan-Meier method was used to visualize the differences between patients with RS-values below or above the optimal cut-off. Differences between Kaplan-Meier curves were determined with the use of logrank tests. Final results are presented as sensitivity / specificity, and positive / negative predictive value. All statistical analyses were performed with the Statistical Package for the Social Sciences (IBM SPSS Statistics Version 22).

Baseline characteristics
In the UMCG, a total of 1690 kidney transplantations were performed from 2000 to 2014 and in the LUMC a total of 511 kidney transplantations were performed from 2011 to 2014. A total of 232 (13.7%) out of the 1690 patients in the UMCG and 200 (39.1%) out of 511 patients in the LUMC fitted the inclusion criteria. Subsequently, 377 patients were included after exclusion of 55 patients due to missing original RS data. Mean age (± SD) was 49 ± 14 and 55 ± 13 years, 57% and 59% were male, 17 (10%) and 24 (12%) patients underwent a preemptive KTX, and the median (IQR) time of (hemo)dialysis prior to transplantation was 32.0 (48.50-54.5) and 36.4 (14.3-57.3) months, respectively, for the UMCG cohort and LUMC cohort (Table 1). In the UMCG cohort, 17% of patients were transplanted after living donation, 41% after donation after brain death (DBD), and 42% after donation after circulatory death (DCD). In the LUMC cohort, living donation was seen in 16%, DBD in 25%, DCD in 47%, and simultaneous kidney-pancreas donation in 12% of patients. A total of 143 (81%) and 131 (66%) patients developed DGF for a duration of > 7 days respectively, in the UMCG cohort and LUMC cohort. Hundred-twenty-six patients (46.0%) were stratified in the DGF > 7 days group based on dDGF and 148 (54.0%) patients were stratified in this group based on fDGF. Renal needle biopsies were performed in 35 out of 177 patients (18%) patients in the UMCG cohort and in 21 out of 200 patients, in the LUMC cohort, within the first 7 days. Within the first year after KTX, a total of 137 (77%) and 129 (65%) of patients received a renal biopsy (including 14 days biopsies), respectively, in the UMCG cohort and LUMC cohort. In the UMCG cohort, rejection was proven by renal biopsy in 14 out of 177 patients (8%) within the first 7 days and in 57 patients (32%) during the first year (including 14 days rejections). In the LUMC cohort, rejection was proven in 9 out of 200 (5%) patients within the first 7 days and in 36 patients (18%) during the first year. A significant difference in baseline characteristics was found for the variables 'age' (P<0.001), 'BMI' (P = 0.025), 'DGF > 7 days after KTX' (P<0.001), and biopsy results for 14 days and 1 year after KTX, other baseline characteristics were not significantly different.

Qualitative grading and DGF prediction
Results of qualitative grading of the RS curves are shown in Table 2 (Fig 3), for patients with a qualitative grade above and below the optimal cut-off, were significantly different (log-rank P<0.001). Qualitative grading for the prediction of DGF ! 7 days, with > grade 2 as cut-off, had a sensitivity and specificity of respectively 87% and 65%, with a positive and negative predictive value of 87% and 65%.

Quantitative analysis and DGF prediction
Differences in mean values for patients with (n = 274) or without DGF (n = 103) were statistically significant for cTER, Average upslope, TFS, MUC10 (P<0.001) and R20/3 (P = 0.042) ( Table 3 Table 2). The levels of discrimination of TFS and MUC10 were considered acceptable, with an AUC of respectively 0.75 and 0.75 for DGF ! 7, and R20/3 had a poor level of discrimination, with an AUC of 0.57. The optimal cut-off values for parameters with a reliable or acceptable level of discrimination are presented in Table 2. AUC values were not significantly different between patients from the UMCG cohort and LUMC cohort for TFS (P = 0.226), cTER (P = 0.089), MUC10 (P = 0.758), and Average Upslope (P = 0.096). Log-rank tests comparing Kaplan-Meier curves (Fig 3), for patients with a quantitative grade above and below the optimal cut-off, were significant (P<0.001) for the indices TFS, cTER, MUC10, and Average Upslope. YI values of cTER and Average upslope, for predicting DGF ! 7 days, was 0.48 for both indices, with a sensitivity 76% and a specificity of 72% for cTER, and a sensitivity of 75% and specificity of 73% for Average upslope. The YI for TFS and MUC10, for predicting DGF ! 7 days, was 0.41 for both indices, with a sensitivity 80% and a    Table 2.

Differentiating ATN and rejection
Based on previously published studies, we tried to reproduce whether patients with a DGF course longer than anticipated were more likely to have an additional pathology such as rejection. This group did not show more cases of rejection between day 7 and 14 compared to the group with expected DGF ! 7 days, based on qualitative grading: 4 (11%) patients with rejection out of the 36 patients below the cut-off (grade 2) and 24 (10%) patients with rejection out of the 238 patients above the cut-off. Excluding all cases of rejection between day 7 and 14, did not result in a better performance for predicting DGF !

Discussion
This study shows that a DGF duration of ! 7 days post-KTX can be predicted by using qualitative grading of RS curves and RS quantitative indices (cTER and Average upslope). The RS grades and indices showed a high sensitivity and positive predictive value for predicting a DGF duration longer than 7 days. However, the specificity of DGF prediction is insufficient to reliably identify patients with longer than expected DGF duration and a higher risk of AR. These findings can facilitate clinical management and help to inform patients on the expected course after KTX. RS does not help to identify patients with an increased risk of rejection and requiring renal biopsy. Several previous studies described the value of qualitative and quantitative RS analyses using Tc-99m MAG3 or Tc-99m DTPA. However, only three of these studies focused on the prediction or correlation of RS curve grading and transplantation outcomes. [14,17,19 In 2000 the qualitative grading scale of Heaf and Iversen for Tc-99m MAG3 RS was introduced first and found a 76% sensitivity for predicting a 10% serum creatinine rise within two days. [14] In two more recent studies, the applicability of a perfusion curve grading scale for Tc-99m DTPA RS was described in which a sensitivity and specificity of respectively 94% and 44% was reported for the prediction of rejection within the first week after KTX and a sensitivity and specificity of 39.5% and 93.7% for the prediction of a serum creatinine rise at three months after KTX. [17,19] Due to differences in Tc-99m DTPA and Tc-99m MAG3 curve shapes, caused by the higher renal extraction rate of MAG3 compared to DTPA, we could not apply this grading scale of Tc-99m DTPA in our study. Three previous publications focused on DGF in which, among others, a significant correlation between the MUC10 index and the occurrence of DGF within the first week post-KTX was reported, concluding that a MUC10 value below the cut-off indicates a poorer graft prognosis. [8,10] In 2002, the TFS index was introduced for the use of predicting DGF. In a small prospective study in 42 patients, the authors demonstrated a significant difference in TFS values for patients with DGF and patients with immediate graft function. [8] A more recent article by Yazici et al., described a high sensitivity and specificity (86.1% and 86.2%) for the identification of DGF by using the relatively new index graft index (GI) in 179 patients. [9] Since our study was focusing on Tc-99m MAG3 RS, we were again unable to apply this new Tc-99m DTPA RS curve index. However, the presence of DGF is generally detected by serum creatinine measurement and RS is only of additional value if it can provide information on the cause of DGF or the prognosis. In the current study, we focused on the prediction of the duration of DGF which showed to have a sensitivity and specificity of the TFS for DGF ! 7 days of 80% and 61%.
Some limitations of our study need to be addressed. First, the retrospective design of this study creates the risk of selection bias. RS procedures were performed at the physicians' discretion in the UMCG cohort which contributed to a selection of patients with generally worse early post-transplant outcomes compared to the whole renal transplant population. This was confirmed by the higher number of rejections at day 14 post-KTX together with a higher percentage of patients with DGF after all types of donation, both showing a significant difference compared to the LUMC cohort. The high number of patients with DGF influenced the positive and negative predictive values, which should therefore be judged together with the provided sensitivity and specificity. Second, we had to exclude 55 patients, because the RS-results could not be retrieved, due to incomplete data. However, we do not expect that this has altered our results since the exclusion was due to technical problems and thus not leading to extra selection bias. Third, the long period of inclusion, 2000-2014, contributed to a variety of RS protocols. The main differences were the duration of the scan, between 20 and 40 minutes, and the software used for the RS-analysis. This shortcoming was addressed by shortening the analyzed procedure time, maximum analyzed procedure duration of 20 minutes, and by using the new method for the quantitative analyses. Finally, we do not expect large differences in the injected dose of the radiopharmaceutical since the deviation in administered doses were considered small. Additionally, we minimized the effect of the administered doses by correcting the quantitative indices for the administered dose.
In conclusion, the qualitative RS-grading and the RS quantitative indices cTER and Average upslope predict a DGF duration longer than 7 days with a high sensitivity. This finding may help to support both clinicians and patients in managing expectations in the early post-operative period. However, due to the low specificity for DGF ! 7 days RS does not contribute to the identification of patients with prolonged DGF due to superimposed rejection.