Validation of pathological grading systems for predicting metastatic potential in pheochromocytoma and paraganglioma

Purpose The Grading system for Adrenal Pheochromocytoma and Paraganglioma (GAPP) was proposed for predicting the metastatic potential of pheochromocytoma and paraganglioma to overcome the limitations of the Pheochromocytoma of the Adrenal Scaled Score (PASS). However, to date, no study validating the GAPP has been conducted, and previous studies did not include mutations in the succinate dehydrogenase type B (SDHB) gene in the score calculation. In this retrospective cohort study, we validated the prediction ability of GAPP and assessed whether it would be improved by inclusion of the loss of SDHB immunohistochemical staining. Methods We divided the tumors into non-metastatic and metastatic groups based on the presence of synchronous or metachronous metastases. The GAPP score and PASS at the initial operation were measured. Moreover, we combined some GAPP parameters with the immunohistochemical staining of SDHB to obtain a modified GAPP (M-GAPP) score. Results Metastasis occurred in 15/72 (20.8%) patients, with a mean follow-up of 43.5 months. Loss of SDHB staining was more frequent (P = 0.044) in the metastatic group. The GAPP score (P = 0.006), PASS (P = 0.003), and M-GAPP score (P<0.001) were all higher in the metastatic group. Twelve of 40 (30.0%) moderately or poorly differentiated tumors, as defined by the GAPP score, and 12/34 (35.3%) tumors with a PASS ≥4 were metastatic. Conversely, 10/19 (52.6%) tumors with an M-GAPP score ≥3 were metastatic. The area under the curve of the M-GAPP score (0.822) was significantly higher than that of the GAPP (0.728) (P = 0.012), but similar to that of the PASS (0.753) (P = 0.411). The GAPP (P = 0.032) and M-GAPP scores (P = 0.040), but not PASS (P = 0.200), negatively correlated with metastasis-free survival. Conclusion The GAPP was validated, and M-GAPP, a combination of some GAPP parameters and loss of SDHB staining, might be useful for the prediction of the metastatic potential of pheochromocytoma and paraganglioma.


Introduction
Pheochromocytoma and paragangliomas (PPGLs) are rare catecholamine-secreting neuroendocrine tumors that arise from chromaffin cells of the adrenal medulla and extra-adrenal sites, respectively [1]. The majority of PPGLs are benign; however, approximately 10% of pheochromocytomas (PHEO) and 15-35% of paragangliomas (PGL) are malignant [2]. The current definition of malignancy by the World Health Organization is only the presence of metastases in non-chromaffin tissue [3,4]. The 5-year overall survival rate of patients with metastases ranges from 30-92% [2,5,6], whereas that of patients without metastases is 89.3% [7]. The main cause of death during the follow-up period is metastasis occurrence, even in patients who were initially diagnosed with benign PPGLs [7]. Therefore, it is very important to predict the metastatic potential of PPGLs, because such cases should be followed-up more aggressively.
Attempts to develop effective systems for predicting the metastatic potential of PPGLs using multiple histological parameters have been made, because no individual findings have been shown to be sufficiently reliable to allow a tumor to be confidently dismissed as benign. For example, the Pheochromocytoma of the Adrenal Scaled Score (PASS) [8] and the Grading system for Adrenal Pheochromocytoma and Paraganglioma (GAPP) [9] have been used in previous studies. The PASS is a weighted score comprising 12 specific histological features that are more frequently identified in metastatic pheochromocytoma [8]. PASS has some limitations such as its application to only pheochromocytoma and high inter-observer and intraobserver variations, even by expert pathologists [10]. To overcome this, the GAPP was recently developed [9]. This score excludes some of the poorly concordant histological features in the PASS and additionally includes the biochemical phenotype [11]. However, to date, there has been no validation study of the GAPP system [11,12], and its clinical use is thus limited. More importantly, the lack of the inclusion of mutations in the succinate dehydrogenase gene subunit B (SDHB) was pointed out as a limitation of the GAPP system [11,12], because SDHB mutations are well known to be strongly correlated with the metastatic potential of PPGL [1]. Several studies have shown that SDHB gene mutations can be detected by the loss of SDHB staining on immunohistochemistry (IHC) [13,14]. Although one study did not include this factor in the GAPP grading system, it suggested that a combination of loss of SDHB staining with GAPP might be useful to predict metastatic potential [9]. Therefore, in this retrospective cohort study, we aimed to (1) validate the GAPP and (2) determine the improvement in predictive ability by using a modified GAPP (M-GAPP) score, comprising a combination of the loss of SDHB staining with some GAPP parameters, by comparing it with the PASS and GAPP score.

Patients and tissues
The study population consisted of 72 PPGL patients who were diagnosed at Asan Medical Center, Seoul, Korea from July 2007 to August 2016 (S1 File). Paragangliomas of the head and neck usually arise from parasympathetic neuronal tissue but have different behaviors, such as a lack of secretion of catecholamines, and were therefore excluded from this study. We obtained the patients' clinical information, including sex, age at initial diagnosis, location of primary tumor, and 24-hour urinary metanephrines secretion.
All study participants provided written informed consent. This study was approved by the Institutional Review Board of Asan Medical Center.

Biochemical testing
We measured urine metanephrines, including 24-hour urinary fractionated metanephrine (UMN) and normetanephrine (UNM), using the Agilent 1100 high-performance liquid chromatography system (Agilent Technologies, Santa Clara, CA, USA) with an electrochemical detector, and using a high-performance liquid chromatography kit (Chromsystems, Munich, Germany). A tumor was defined as functional when the UMN or UNM was elevated. We used symptom-dependent cut-offs of urine metanephrines, as previously described [15].

Baseline histological analysis
For the histological analysis, tumor size was grossly measured as the maximum diameter of the tumor specimen. All hematoxylin and eosin-stained slides of the 72 surgically resected PPGL specimens were reviewed by an experienced endocrine pathologist based upon the GAPP scoring system classification (Table 1) [9] and PASS (S1 Table) [8] in a blinded manner without knowledge of the clinical outcome. The tumors were classified into 3 differentiation types according to their GAPP scores: well differentiated (WD; 0-2), moderately differentiated (MD; 3-6) and poorly differentiated (PD; 7-10). A PASS !4 was defined as having increased metastatic potential, as compared to PASS <4.
IHC staining for Ki-67 was performed using an automated IHC staining instrument (Benchmark; Ventana Medical Systems, Tucson, USA), the UltraView Universal DAB kit (Ventana Medical Systems), and a Ki-67 antibody (clone MIB-1, 1:200 dilution; DAKO, Glostrup, Denmark). The Ki-67 labeling index was evaluated by a formal manual count to count PPGL cells only. Before counting, the areas for analysis were assessed to select the hottest spot with positively stained PPGL cells, and 2-3 static images were obtained for each case. For all cases, at least 1000 cells were independently and manually evaluated. The number of Ki-67-positive cells per 100 PPGL cells was designated as the labeling index in the hottest spot.
IHC staining of SDHB was also performed using an automated IHC staining instrument (Benchmark; Ventana Medical Systems), the UltraView Universal DAB kit (Ventana Medical Systems), and an SDHB antibody (rabbit polyclonal HPA002868, 1:400 dilution; Sigma-Aldrich, St Louis, MO, USA). Cases with any definite granular cytoplasmic staining (mitochondrial pattern) were scored as positive ( Fig 1A). The proportion of positive granular cytoplasmic staining varied greatly between positive cases. Weak diffuse cytoplasmic staining was occasionally and heterogeneously observed in combination with definite granular cytoplasmic staining, which was scored as 'positive'. Cases with completely absent staining or only weak diffuse cytoplasmic staining, in contrast to the positive internal controls (endothelial cells, sustentacular cells, and lymphocytes), were scored as 'negative' (Fig 1B). If the urine fractionated metanephrine (UMN) levels were high with or without elevated urine fractionated normetanephrine (UNM) levels, the catecholamine type was adrenergic type.
If the UNM levels were high without elevated UMN levels, the catecholamine type was noradrenergic type.
https://doi.org/10.1371/journal.pone.0187398.t001 Definition and clinical assessment of metastatic PPGL Metastatic PPGL was defined as the presence or recurrence of metastatic lesions at sites where neuroendocrine tissue is normally absent [4]. For intra-abdominal metastasis, metastatic PPGL was defined as only the occurrence of lymph node metastasis to rule out pheochrocytomatosis [16]. Most metastatic lesions were confirmed by histologic evidence except for 5 patients who were confirmed by computed tomography and 123 I-meta-iodo-benzyl-guanidine imaging, as well as by functional studies, as metastatic PPGLs. Patients with metastases were subdivided into two groups, synchronous and metachronous metastases, defined as those who had metastatic lesions at the time of or <6 months after diagnosis of the primary tumor and those who developed metastases ! 6 months after the initial time of diagnosis and/or resection of the primary tumor, respectively [3].

Statistical analysis
Data are presented as the mean±standard deviation, median [interquartile range (IQR)], or as numbers (percentages). The patients' baseline characteristics were compared using Student's t test or the Mann-Whitney U-test for continuous variables, or the chi-square test for categorical variables. Univariate and multivariate Cox proportional hazards regression models were evaluated to assess the association of each parameter of the GAPP, PASS, and M-GAPP with the risk of malignancy. The abilities of the GAPP score, PASS, and M-GAPP score to predict malignancy were quantified using the area under the curve (AUC) from receiver-operating characteristic (ROC) analysis. The metastasis-free survival (MFS) was defined as the interval between surgery and the date of diagnosis of the first metastasis. Correlations of the GAPP score, PASS, and M-GAPP score with the MFS were analyzed using Pearson's correlation analyses. We used the Kaplan-Meier method to estimate the MFS and the log-rank test to compare the MFS between the groups. All tests were 2-sided, and P<0.05 was considered statistically significant. All statistical analyses were performed using SPSS version 18.0 (SPSS Inc., Chicago, IL, USA).

Characteristics of the 72 PPGL patients
With a mean follow-up duration of 43.5 months after the initial operation, metastases occurred in 15 of 72 (20.8%) patients, including 5 (6.9%) synchronous and 10 (13.9%) metachronous metastases ( Table 2). The age tended to be younger (P = 0.061) and tumor size tended to be larger (P = 0.060) in the metastatic than in the non-metastatic group. Patients with only the adrenergic secretory type tended to be more common than those with the noradrenergic secretory type (P = 0.097) in the metastatic than in the non-metastatic group. There were no significant differences in the levels of urine metanephrines between the two groups. A loss of SDHB staining was observed in 11 (15.3%) tumors, and was significantly more frequent in the metastatic (5 of 15, 33.3%) than in the non-metastatic group (6 of 57, 10.5%) (P = 0.044). The follow-up duration tended to be longer in the non-metastatic than in the metastatic group (P = 0.083).
We performed a subsequent analysis separately for PHEO and PGL (Table 3). Metastases occurred in 13 of 73 (20.6%) patients with PHEO and 2 of 7 (22.2%) patients with PGL. The age tended to be younger (P = 0.072) in the metastatic PHEO than in the non-metastatic PHEO group. There were no significant differences in the levels of urine metanephrines and frequency of secretory type between the two groups in both PHEO and PGL patients. A loss of SDHB staining tended to be more frequent in the metastatic PHEO (4 of 13, 30.8%) than in the non-metastatic PHEO group (4 of 50, 8.0%) (P = 0.084).
Correlation of the individual parameters of the GAPP score, PASS, and M-GAPP score with the occurrence of metastasis In the univariate analysis, 4 (67%) of 6 GAPP parameters (i.e., histological pattern, comedotype necrosis, Ki67 labeling index !3%, and noradrenergic type), were significantly different between the metastatic and non-metastatic groups (P<0.001-0.029) ( Table 4). In the multivariate analysis, 2 (33%) parameters, namely the histological pattern and Ki67 labeling index 1-3%, were significantly different between the two groups (P = 0.010-0.029). Of 12 PASS parameters, 5 (42%) in the univariate analysis and 4 (33%) in the multivariate analysis were significantly different between the two groups (S2 Table).
Exclusion of loss of SDHB staining from the GAPP parameters has been established as one of the intrinsic problems with the GAPP. A previous study reported that loss of SDHB staining did not occur in WD PPGLs of the GAPP classification [9]; however, 2 of the 29 (6.9%) nonmetastatic PPGLs with WD type showed a loss of SDHB staining in the present study (S3 Table). These findings suggest that a loss of SDHB staining per se is not sufficient to predict metastasis; hence, it might be appropriate as one of the parameters in the scoring system. Among the GAPP parameters, there was no significant difference in cellularity, Ki67 labeling index between 1-3% or >3%, and capsular or vascular invasion. Therefore, we reconstructed the M-GAPP using a combination of the loss of SDHB staining with some of the GAPP parameters (Table 5). In the univariate analysis, all 6 (100%) M-GAPP parameters, that is, large and irregular cell nest or pseudorosette, comedo-type necrosis, vascular invasion, Ki67 labeling index !1%, noradrenergic type, and loss of SDHB staining, were significantly different between the metastatic and non-metastatic groups (P<0.001-0.029) ( Table 5). In the multivariate analysis, 3 (50%) of these parameters, namely large and irregular cell nest or pseudorosette, comedo-type necrosis, and Ki67 labeling index !1%, remained significantly different between the two groups (P = 0.001-0.049).
Comparison of the 3 scoring systems for predicting metastatic potential We selected 3 as the best cutoff value of the M-GAPP score, which corresponded to Youden's index [17] in the receiver-operating characteristics curve analysis. Forty-eight of 53 (90.6%) PPGLs with M-GAPP <3 were non-metastatic, while 10 of 19 (52.6%) PPGLs with M-GAPP !3 were metastatic (P<0.001). This finding indicates that the negative predictive value for  occurrence of metastasis is comparable, while the positive predictive value for occurrence of metastasis is the highest by the M-GAPP scoring system compared to the other two scoring systems.
The AUC of the GAPP score for predicting metastatic PPGL (0.728) was similar to that of the PASS score (0.753) (P = 0.757). The AUC of the M-GAPP score (0.822) was significantly higher than that of the GAPP score (P = 0.012) and similar to that of the PASS score (P = 0.411) ( Table 7).

Discussion
Our longitudinal study with a mean follow-up duration of approximately 4 years suggested that the GAPP classification might be a validated system for the prediction of metastatic potential and that our modified GAPP classification including a loss of SDHB staining might result Table 6. Comparison of the GAPP score, PASS, and M-GAPP score for predicting metastatic potential in PPGL.  in an improved ability to predict metastatic potential. The baseline GAPP score, PASS, and M-GAPP score were all higher in the metastatic group than in the non-metastatic group. The predictive ability of the M-GAPP score was better than that of the GAPP score, and was similar to that of the PASS. Higher GAPP and M-GAPP scores, but not PASS, were associated with a shorter MFS. Collectively, these findings suggest that, out of the 3 scoring systems, the M-GAPP might be the most useful for the prediction of metastasis in PPGL.

Variable
Determining the metastatic potential of PPGL, which is particularly important for guiding therapeutic interventions and patient management, primarily depends on histology. However, malignant PPGL can be diagnosed only after the development of metastases, which can sometimes occur as long as 20 years after the initial surgery [18]. To overcome this shortcoming, the PASS and GAPP systems were suggested to stratify primary tumors according to the risk of Validation of pathological grading systems for metastatic potential in pheochromocytoma and paraganglioma metastasis. Although these systems provide a reasonable prediction of metastases, several limitations still prevent either system from being generally accepted or officially endorsed [12]. In particular, the PASS had been applied only to pheochromocytoma and had a very poor concordance among expert pathologists in a validation study [10]. On the other hand, the GAPP has so far not been validated and does not include the presence of mutations in SDHB, which strongly correlates with metastatic potential [6,11,12].
Regarding the validation of the GAPP classification, the GAPP score was higher in the metastatic group and negatively correlated with the MFS, 90.6% of WD tumors were non-metastatic, all PD tumors were metastatic, and the MFS significantly differed between the 3 differentiation types of the GAPP classification in the present study. Therefore, the GAPP classification might be useful to predict metastases, consistent with the results of a previous study [9]. However, there were some differences between the previous and present studies. First, 9.4% in the present study and 3.6% (4 of 111) of WD PPGLs in the previous study revealed metastases. Second, a smaller proportion (22.2%) of MD PPGLs in the present study revealed metastases than in the previous study (21 of 35 MD PPGLs, 60.0%) (P = 0.003). Third, only 2 parameters in the present study, in contrast to all 6 GAPP parameters in the previous study, were significantly associated with metastatic potential on multivariate analysis. Fourth, the tumor capsule was mostly incomplete or absent in our cases [3], and the assessment of cellularity was not easy, with potentially high inter-observer variation, similar to for the Ki-67 labeling index. These results highlight several limitations of the GAPP classification.
Additionally, another limitation of the GAPP is that assessment of mutations in the SDHB gene is not included. A previous study showed that none of the WD PPGLs and 10 of 13 (77%) PPGLs with a loss of SDHB staining were metastatic [9], suggesting that a combination of the GAPP classification and SDHB IHC staining might be useful to predict metastases. However, due to the limited cases with negative SDHB staining among MD and PD PPGLs, the loss of SDHB staining was not included to the GAPP parameters in previous study [9,11]. Although metastatic PPGLs showed more loss of SDHB staining than non-metastatic PPGLs in the present study, 6.3% of WD PPGLs also showed negative SDHB staining, and 6 of 11 (54.5%) PPGLs with a loss of SDHB staining were non-metastatic. These results indicate that a loss of SDHB staining per se is not sufficient to predict metastasis; hence, we included it as one of M-GAPP parameters. Finally, we modified the GAPP by combining some useful parameters of the original GAPP and the loss of SDHB IHC staining.
When compared with the PASS and GAPP classifications, 52.6% of PPGLs with M-GAPP !3 revealed metastases, while only 35.3% of PPGLs with PASS !4 and 30.0% of PPGLs with Validation of pathological grading systems for metastatic potential in pheochromocytoma and paraganglioma GAPP !3 revealed metastases. The M-GAPP score, but not PASS, negatively correlated with the MFS, and the predictive ability of the M-GAPP score was greater than that of the GAPP score. Most of the improvement was seen in the specificity (84.2% for M-GAPP vs. 50.9% for GAPP). Collectively, the M-GAPP system was superior in the prediction of metastatic potential of PPGLs than the other 2 scoring systems.
Our study has several limitations that should be addressed in future studies. First, PPGLs are rare neuroendocrine tumors, and we were hence only able to include a small number of Korean patients from our single medical center. Thus, further multicenter studies that include larger number of PPGLs from various ethnic groups are needed. Second, the predictive ability of the M-GAPP classification, particularly in terms of its sensitivity, should be further improved. Herein, our main aim was to validate the GAPP, so we did not consider the inclusion of other known clinical or genetic parameters reflecting metastatic potential such as age, location, size, methoxytyramine levels, and/or other molecular markers [11,18]. Furthermore, although the loss of SDHB IHC staining can predict the SDHB mutation, it can be associated with other SDHA, SDHC, and SDHD mutations [14]. Tumors with SDHA, SDHC, or SDHD mutations revealed lesser aggressive clinical behaviors than those with SDHB mutation [19], so lack of specificity of loss of SDHB IHC staining only for SDHB mutation can be major limitation of M-GAPP classification. Thus, further comprehensive research is needed to improve the predictive scoring system of PPGLs through combinations of potential clinical-histological-genetic parameters including SDHB mutation with the M-GAPP system.

Conclusion
Our data indicate that the GAPP classification might be a validated system for the prediction of metastatic potential. Moreover, the M-GAPP classification, which includes a loss of SDHB staining, might improve the ability to predict metastatic potential. Such risk stratification might be useful for personalized management of and as a screening strategy for PPGLs, as it could reduce both the costs of long-term follow-up and the risk of disseminated disease.