Joint approach based on clinical and imaging features to distinguish non-neoplastic from neoplastic pituitary stalk lesions

Purpose Distinguishing non-neoplastic pituitary stalk lesions (non-NPSLs) from neoplastic pituitary stalk lesions (NPSLs) is a major concern in guiding treatment for a thickened pituitary stalk. Our study aimed to aid provide preoperative diagnostic assistance by combining clinical and magnetic resonance imaging (MRI) findings to distinguish non-NPSLs from NPSLs. Materials and methods We recruited 158 patients with thickened pituitary stalk lesions visible on MRI. Laboratory findings included hypopituitarism, diabetes insipidus (DI), and hyperprolactinemia. MR images were assessed for anterior–posterior thickness (mm), diffuse pituitary stalk thickening, cystic changes, a high T1 signal, and glandular or extrasellar involvement. A diagnostic model was developed using a recursive partitioning logistic regression analysis. The model was validated in an independent dataset comprising 63 patients, and its diagnostic performance was compared with that of the original radiological reports. Results A univariate analysis found significant associations of DI (P = 0.006), absence of extrasellar involvement (P = 0.002), and lower stalk thickness (P = 0.031) with non-NPSLs. A diagnostic model was created using the following parameters (in order of priority): 1) lack of extrasellar involvement, 2) stalk thickness < 5.3 mm, and 3) presence of DI. The diagnostic performance (area under the curve; AUC) of this model in the independent set was 0.813, representing a significant improvement over the original radiological reports (AUC: 0.713, P = 0.029). Conclusion The joint diagnostic approach based on clinical and imaging-based factors robustly distinguished non-NPSLs from NPSLs. This approach could guide treatment strategies and prevent unnecessary surgery in patients with non-NPSL.

Introduction of patients between January 2009 and March 2016. A text search of the radiology reports during the study period, using the terms "pituitary stalk" and "infundibulum" and identified 2112 patients. Patients were subsequently excluded if 1) their radiological report read "normal pituitary stalk" and "normal pituitary infundibulum" (n = 1852), 2) they had a history of surgery or treatment adjacent to the pituitary stalk (n = 102). For the remaining 158 patients, the initial brain or sellar-specific MRI that identified a pituitary stalk abnormality was used to characterize the lesion. Laboratory, pathological, and clinical evaluations, and follow-up MRI analyses, were performed on clinically indicated cases. Patients were excluded if they required an endocrinological evaluation for the final diagnosis, but did not receive an endocrinological laboratory test.
The validation group was selected chronologically, and comprised 63 consecutive patients diagnosed with pituitary stalk lesions between April 2016 and September 2017. The reference standard for diagnosis was identical to that used for the training set and comprised a pathological or clinico-radiological diagnosis.

Hormonal evaluation
Each patient's hormonal status was evaluated before any medical treatment, surgical biopsy, or excision. The hormonal tests outlined below were performed for patients suspected to have anterior hypopituitarism. When the baseline serum measurements were abnormal, further dynamic confirmatory tests were performed. The gonadotropin axis was evaluated using the baseline serum levels of follicle-stimulating hormone (FSH), luteinizing hormone (LH), estradiol, and testosterone. The thyrotropin axis was evaluated by using the baseline serum levels of thyroid-stimulating hormone (TSH), free thyroxine, and free triiodothyronine. The growth hormone (GH) axis was evaluated using the serum levels of GH and insulin-like growth factor-1. The corticotropin axis was evaluated by baseline measurement and dynamic endocrine tests of serum cortisol, adenocorticotropic hormone (ACTH) at 8 am, an insulin-induced hypoglycemia test or a corticotropin test. Hypopituitarism was defined as more than one axis of deficiency of secretion of anterior pituitary hormones including FSH, LH, ACTH, TSH, and GH. Secondary hormonal deficiencies were diagnosed on the basis of low levels of primary hormones, with corresponding low levels of trophic pituitary hormones. Hyperprolactinemia was defined as a serum prolactin level exceeding 20 μg/L in patients without a history of risperidone or metoclopramide medication [15]. DI was diagnosed on the basis of typical signs and symptoms [16]; and documented through the measurement of sodium levels in the serum and urine as well as osmolality. More specifically, patients with DI have dilute urine (< 300 mOsm/kg H 2 O) with a urinary volume > 40 mL/kg/day. Twenty-one patients were subjected to a water deprivation test to further differentiate between partial central DI, partial nephrogenic DI, and primary polydipsia.

Image acquisition
MRI was primarily performed using either a 3.0-T MRI scanner (Achieva or Ingenia; Philips Medical Systems, The Netherlands or Skyra; Siemens Healthcare, Germany) or a 1.5-T MRI scanner (Achieva; Philips), with an eight-channel head coil. The brain MRI protocol comprised a spin-echo sequence that included sagittal and axial T1-weighted imaging, axial T2-weighted imaging, axial fluid-attenuated inversion recovery imaging, and axial contrastenhanced T1-weighted imaging. Following the injection of gadolinium based contrast (gadoterate meglumine), a gradient-echo, contrast-enhanced, T1-weighted image was obtained and reconstructed into the axial, coronal, and sagittal planes. The imaging parameters were as follows: 1) sagittal spin-echo T1-weighted images-repetition time (TR)/echo time (TE) = 450 msec/9.5 msec, section thickness = 5.0 mm, field of view (FOV) = 20 cm; 2) gradient-echo contrast enhanced T1-weighted images: TR/TE = 1800 msec/3.2 msec, section thickness = 3.0 mm, FOV = 25 cm. All patients underwent scanning with the enhanced brain MRI protocol.

Image analysis
Two independent neuroradiologists (J.E.P with 5 years and J.Y.L with 3 years of experience in neuroradiology) analyzed the MR images and recorded the following findings: AP thickness (mm), presence of a diffusely thickened pituitary stalk, cystic change, high T1 signal, pituitary gland involvement, or extrasellar involvement (Fig 1).
The AP thickness was measured at the maximal diameter on a midline, sagittal, contrastenhanced T1-weighted image. Diffuse stalk thickening was defined as a lesion with diffuse, uniform thickening without a nodular or fusiform shape. Cystic changes were evaluated using coronal or sagittal T2-weighted and contrast-enhanced T1-weighted images if a non-enhancing T2 high-signal area was present. High-T1 signal foci were identified on coronal or sagittal pre-contrast T1-weighted images if high signal foci relative to normal gray matter were detected. Extrasellar involvement was defined as the presence of a parenchymal or leptomeningeal enhanced lesion in other areas of the brain that were non-contiguous with the pituitary stalk or sellar fossa. Interobserver agreement was assessed, and discordant interpretations were resolved by consensus to create the diagnostic model. To determine AP thickness more precisely, the measured thicknesses obtained by the two readers were averaged.

Reference standards for diagnosis
The reference standard for diagnosis was constructed in consensus between a neurosurgeon (J.H.K. with 22 years of experience in neurosurgery) and a neuroradiologist (H.S.K. with 15 years of experience in neuro-oncological imaging). All available laboratory, pathological, and clinical evaluations and all follow-up MRI images were reviewed. A pathological or clinicoradiological diagnosis was considered as reference standard. Among the included patients, 102 had a confirmed pathological diagnosis, and 56 had been diagnosed using clinic-radiological information. Fifty-six patients for whom pathological specimens were not available were diagnosed using clinico-radiological information when (1) tissue obtained from other areas of the brain and the lesion involving the stalk had similar imaging appearances and (2) tissue obtained from an extracranial lesion and follow-up MRI, clinical, and laboratory findings strongly suggested a specific diagnosis. We additionally attempted to reduce bias by including patients who were followed up for at least 1 year (median follow-up of 35.5 months). The neuro-radiologists who performed imaging analyses were blind to clinical information, which thus separated possible predictors and reference standards.
For instance, hypophysitis was diagnosed when the lesion decreased with corticosteroid treatment and no growth appeared during follow-ups. Metastasis was diagnosed when (1) a rapidly growing stalk lesion was detected, (2) surgery or biopsy from other brain areas or an extracranial site showed a primary malignancy, and (3) the imaging findings were similar in the pituitary stalk.

Statistical analysis
The frequencies of imaging features of non-NPSLs and NPSLs were compared using the χ 2 test (for categorical variables) and t-test (for continuous variables). Interobserver agreement for each imaging feature was calculated using κ statistics.
The contribution of each imaging feature was evaluated using univariate and multivariate logistic regression models after differentiating non-NPSLs from NPSLs using a stepwise procedure. Based on these logistic regression analyses, a recursive partitioning tree classification algorithm was used to suggest a diagnostic tree model. An additional subgroup analysis of NPSLs was performed to identify significant predictors of pituitary metastases.
An independent radiologist used the validation set to make imaging diagnoses based on the suggested diagnostic model. The discriminatory powers of the diagnostic tree model were assessed using a receiver operating characteristic curve analysis, wherein the areas under the receiver operating characteristic curves (AUCs) were evaluated and compared using a conventional radiological report. To evaluate the influence of our combined approach, the AUCs of each imaging and clinical feature were compared with the results of the combined approach. A P-value of <0.05 was considered statistically significant. Statistical analyses were performed using the software package R, version 3.3.2 (http://www.R-project.org) and MedCalc Statistical Software version 17.1 (MedCalc Software, Ostend, Belgium). Table 1 summarizes the diagnostic assignment of the pituitary stalk lesions.
Twenty-eight and 130 patients had non-NPSLs and NPSLs, respectively. The groups were not imbalanced in terms of age (mean ± standard deviation: 48.1 ± 17.8 vs. 48.4 ± 17.1 years, P = 0.938) and magnetic field strength for MR imaging (1.5: 3.0 T, 9:19 vs 34:96, P = 0.06). The male-to-female ratio was higher in the neoplastic group than in the non-neoplastic group (percentage of men: 46.4% in the non-neoplastic group vs. 57.7% in the neoplastic group, P = 0.01). Patients with non-NPSLs had a significantly higher rate of DI (76%, P = 0.006), compared to those with NPSLs.
The interobserver agreements for imaging features were excellent with κ values of 0.879 for the presence of a diffusely thickened pituitary stalk, 1.0 for cystic changes, 0.98 for a T1 high signal, 0.98 for pituitary gland involvement, and 0.99 for extrasellar brain involvement.
Among the imaging features, the AP thickness on sagittal images was significantly larger in patients with NPSLs (mean ± standard deviation, 10.1 ± 8.1 mm) than in those with non-NPSLs (5.27 ± 3.69 mm, P = 0.004). Patients with non-NPSLs more frequently had diffuse Note-primary malignancies in metastasis: lung cancer (n = 27), breast cancer (n = 11), stomach cancer (n = 5), skin cancer (n = 1), melanoma (n = 1), and thyroid cancer (n = 1). stalk thickening, but this was not a significant finding. In contrast, extrasellar involvement was observed more often in patients with NPSLs (P = 0.005). Among a subgroup analysis of NPSLs, a multivariate analysis identified an older age and extrasellar brain involvement as significant predictors of pituitary metastases. The result in the NPSL group is shown in Table 3.  presence of a diffusely thickened pituitary stalk (OR: 4.66, P < 0.001), and the lack of extrasellar brain involvement (OR: 0.04, P = 0.002) were identified as potential predictors of non-NPSLs (Table 4).

Developing a diagnostic model
In the multivariate stepwise regression, all above-listed variables remained independent factors that could distinguish non-NPSLs from NPSLs.
Next, a recursive decision tree was created using the above mentioned significant variables in the training set data (S1 File). Four terminal nodes were produced in five splits. These terminal nodes were as follows (in order of priority): presence of extrasellar involvement, AP thickness with a cutoff of 5.25 mm, presence of DI, and presence of a diffusely thickened pituitary stalk. The established diagnostic model is shown in Fig 2. The diagnostic model correctly classified 141 (89.2%) of the 158 cases in the training set. Regarding non-NPSLs, the sensitivity and specificity of the diagnostic model were 65.5% and 94.6%, respectively. A receiver operating characteristic curve analysis was performed to compare the AUC of the diagnostic tree model with that of the radiological report. The AUC value of the diagnostic model was 0.828 (95% confidence interval [CI] 0.759-0.883), compared with 0.70 (95% CI 0.623-0.771) for the radiological reports; this difference was statistically significant (P = 0.037) . Figs 3 and 4 show representative cases of non-NPSL and NPSL.

Validation and performance of the diagnostic model
The proposed diagnostic model was validated in an independent data set of 63 cases (20 non-NPSLs and 43 NPSLs). No significant differences were found between the study and validation groups in terms of age, sex, or final diagnosis.

Discussion
The present study suggests that a diagnostic approach incorporating clinical and MRI features of thickened pituitary stalk lesions could increase the probability of distinguishing non-NPSLs  from NPSLs. This distinction is important, because non-NPSLs such as autoimmune hypophysitis can be treated medically, whereas NPSLs ultimately require surgery [3,17]. Our diagnostic approach yielded diagnostic performances of 0.83 (AUC) in the training set and 0.81 in the validation set. Furthermore, our approach represented a significant improvement over the original radiological reports. Given its simplicity and convenience, this joint approach may be useful for preoperative diagnosis in patients with a thickened pituitary stalk. Among the various imaging features of the pituitary stalk, only diffuse thickening was found to significantly associate with non-NPSLs, whereas neither cystic changes nor high T1 signal was a significant predictor. Diffuse stalk thickening may reflect the pathological features of lymphocytic hypophysitis, such as the diffuse infiltration of mainly mature lymphocytes, with islets of fibrosis and histiocytes [17,18]. Indeed, this condition was predominant among cases of non-NPSL in our population. On the other hand, an intrinsically high T1 signal in the pituitary posterior lobe, which reflects functional vasopressin storage [19], is frequently reported to be lost in cases of autoimmune hypophysitis but is conserved in the majority of pituitary adenomas [4]. However, many descriptive studies concerning the differentiation of pituitary stalk lesions did not report a relationship between a T1 high signal and the enhancement pattern [3,13,14]. Our findings support the scenario, wherein a loss of high T1 signal intensity does not help to differentially diagnose thickened pituitary stalk lesions, as both non-NPSLs and NPSLs involve the pituitary stalk itself.
Conversely, the imaging features indicative of NPSL were distinct, and included a thickened pituitary stalk with a diameter exceeding 5.25 mm and extrasellar brain involvement. The difference in stalk thickness between non-NPSLs and NPSLs may be attributable to a difference at the time of clinical presentation, as studies have shown that the mean onset of clinical presentation of a pituitary tumor is 23 ± 35 months [20], whereas that of hypophysitis is 10 ± 18 months [21]. Regarding extrasellar involvement, non-NPSLs may exhibit a mass-like configuration and mimic sellar or suprasellar tumors [4,22] but are have less likely to aggressively invade the surrounding suprasellar and bony structures. In addition, non-NPSLs may also be associated with extrasellar brain parenchymal lesions, such as neurosarcoidosis or tuberculosis [23,24], although these entities are relatively uncommon among non-NPSLs. The only patient with extrasellar brain involvement in the present study had tuberculous meningitis, with leptomeningeal enhancement in the basal cistern.
Interestingly, the most useful laboratory predictor of non-NPSLs was the presence of DI. This can also be a clinical manifestation of NPSLs, as the posterior lobe has a rich arterial blood supply and can therefore be directly involved in metastasis through the systemic circulation [25]. In our study, the significant predictor of DI may correlated closely with the onset of clinical presentation. Previous studies have shown that lymphocytic or granulomatous infiltration can induce early, direct, and definitive damage to pituitary cells [17,26], whereas in cases of stalk metastasis, tumor cell infiltration is associated with a delayed onset of DI [8], particularly when extrinsic compression predominates over destructive changes. This finding is also supported by our data that 21.7% of patients with metastases (10/46 patients) and 76% of patients with non-NPSLs had DI. More importantly, our results demonstrated an improvement in diagnostic performance when using laboratory findings vs. imaging features such as the presence or loss of a T1 bright high signal, which indicates a loss of posterior pituitary function. Conversely, the anterior pituitary function, or hypopituitarism/ hyperprolactinemia, was not a useful parameter for differentiating non-NPSLs from NPSLs.
Transsphenoidal surgery is a safe and effective treatment for sellar lesions [4,27]. However, the pituitary stalk plays a critical role in hormonal function, and unnecessary surgery must be avoided. Previous research has revealed that imaging features differ from clinical features in a descriptive sense. Nonetheless, more analytical methods, such as measuring the importance of certain features, may guide clinicians when making differential diagnoses, especially when distinguishing non-NPSLs from NPSLs. Our study which included a large data set, identified an AP stalk thickness threshold of 5.25 mm using routinely performed, sagittal T1-weighted images with a slice thicknesses ranging from 3 to 5 mm. These MRI parameters could be easily applied in daily clinical practice. Furthermore, the diagnostic performance was improved by combining both clinical and imaging features, in contrast with the original radiological reports from the present study. Moreover, our study found robust results for the validation set.
However, our study was subject to several limitations. First, the retrospective nature of the study introduced the risk of selection bias. Our exclusion of patients with insufficient followup, as well as those who lacked a pathological specimen, may have excluded many patients with non-NPSLs resulting in a larger number of patients in the neoplastic group. Second, hormonal tests were indicated clinically, and not all patients underwent testing for the entire pituitary axes of the anterior and posterior pituitary glands. Further studies involving with complete laboratory tests of the entire pituitary axes might help to identify possible predictors among hormonal axes. Third, neither a quantitative analysis nor lesion signal enhancement was possible given the heterogeneous nature of the MRI machines used (1.5 T and 3.0 T). We attempted to minimize the effects of different magnetic field strengths by comparing the lesion signal with that of normal grey matter. In addition, our analysis did not use advanced MRI techniques, including dynamic contrast-enhancement (DCE)-MRI of the sellar fossa. A recent study [28] reported that DCE-MRI helps to localize and characterize pituitary lesions; therefore, a future study should use the homogenous pulse sequence to characterize the signal intensity and dynamic pattern of contrast enhancement. Fourth, although our diagnostic model was tested on an independent data set, the patients were recruited from the same institution; therefore, the study is somewhat limited in terms of generalizability. A multicenter study with a large number of patients may further support our results. Finally, MRI characteristics are routinely implemented into clinical contexts, and the clinical and radiological predictors suggested in this study are not a completely new concept. However, our study is valuable because it has attempted to set an order of priority for the use of various clinical and MRI characteristics when for distinguishing non-NPSLs from NPSLs.
In conclusion, our study identified that the absence of extrasellar brain involvement, diffuse stalk thickening < 5.25 mm, and the presence of DI as significant predictors that differentiate non-NPSLs from NPSLs. Our proposed diagnostic model combined clinical and MRI characteristics to improve the diagnostic performance relative to the original radiological report. Such a model would facilitate decisions regarding further treatment strategies in clinical settings.