Prognostic nomogram predicts overall survival in pulmonary large cell neuroendocrine carcinoma

Background Large cell neuroendocrine carcinoma (LCNEC) is a rare and typically aggressive malignancy with poor prognosis. This study developed a nomogram model to predict the overall survival (OS) of patients with LCNEC. Methods LCNEC patients were identified from the Surveillance, Epidemiology, and End Results database between 2004–2014. Univariate and multivariate Cox regression models were used to determine demographic and clinicopathological features associated with OS. A nomogram model was generated to predict OS and its performance was assessed by Harrell’s concordance index (C-index), calibration plots, and subgroup analysis by risk scores. Results Of 3048 eligible patients with LCNEC, 2138 were randomly grouped into the training set and 910 into the validation set. Age at diagnosis, gender, tumor stage, N stage, tumor size, and surgery of primary site were independent prognostic factors of OS. C-index values of the nomogram were 0.75 (95% CI, 0.74–0.76) and 0.76 (95% CI, 0.74–0.77) in the training and validation sets, respectively. In both cohorts, the calibration plots showed good concordance between the predicted and observed OS at 3 and 5 years. Kaplan-Meier curves revealed significant differences in OS in patients stratified by nomogram-based risk score, and patients with a higher-than-median risk score had poorer OS. Conclusion This is the first nomogram developed and validated in a large population-based cohort for predicting OS in patients with LCNEC, and it shows favorable discrimination and calibration abilities. Use of this proposed nomogram has the potential to improve prediction of survival risk, and lead to individualized clinical decisions for LCNEC.


Introduction Patients and methods
No written informed consent was obtained for this study because the data were de-identified and publicly available.

Data sources and patient selection
We retrieved data from 18 population-based cancer registries in the SEER program using the SEER � Stat software (version 8.3.5) [15]. The diagnosis of LCNEC was defined by the International Classification of Diseases for Oncology, Third Edition (ICD-O-3) histology code (8013) and site recode (lung and bronchus, trachea, mediastinum and other respiratory organs). The inclusion criteria were as follows: diagnosed between 2000 and 2014, and first and only primary cancer diagnosis. Autopsy only and death certification only cases were excluded. These selection criteria resulted in 3038 eligible patients with LCNEC in the primary cohort. These patients in the primary cohort were subsequently randomly assigned, using a simple random splitting method, in a ratio of 7 to 3 to the training or validation sets, using R version 3.5.1 and the "caret" package.

Study variables
The following data were obtained from the SEER database: demographics (gender, race, age at diagnosis, and year at diagnosis); tumor characteristics (primary site, laterality, histologic grade, tumor size, T, N, and M stages, separate tumor nodules-ipsilateral lung); and treatment information (surgery to other reginal/distant sites, and surgery to primary site). Race in SEER is coded as white, black, American Indian/Alaskan, Asian/Pacific Islander, and unknown. Due to the small sample size of the last three categories, the American Indian/Alaskan, Asian/ Pacific Islander, and unknown races were grouped together as "other" in analysis. The primary outcome was OS, defined as the time between diagnosis and death of any cause. Patients still alive on December 31, 2015 were censored based on vital status recode in the SEER database. Surgery of primary cancer and other regional/distant sites were divided into two categories (yes and no).

Statistical analyses
Frequency distributions of demographic, clinical, and pathologic characteristics of the eligible patients were compared between the training and validation sets using the chi-square test. In the training cohort, univariate and multivariate Cox proportional hazards regression models were applied to calculate hazard ratios (HRs) and corresponding 95% confidence intervals (CIs) for demographic and clinical factors as predictors of OS. Variables subject to univariate analysis included age at diagnosis (<50, 50-59, 60-69, 70-79, and �80 years), race (black, white, and others), gender (female and male), primary site (main bronchus, upper lobe, middle lobe, lower lobe, overlapping lesion of lung, and lung NOS), laterality (left vs right vs others), histologic grade (I, II, III, IV, and unknown), tumor stage (I, II, III, and IV), T stage (T1, T2,  T3, T4, and Tx), N stage (N0, N1, N2, N3, and Nx), M stage (M0, M1a, M1b, and M1 NOS), tumor size (<20, 20-29 30-39 40-49, and �50 mm), separate tumor nodules-ipsilateral lung (no separate nodules, in different lobe, in the same lobe, separate tumor nodules NOS, and unknown), surgery to other regional/distant Sites (yes and no), surgery of primary site (yes and no). Missing data were grouped as a separate category in the regression analysis. Significant variables (P < 0.05) in the univariate Cox regression analyses were included in the multivariate Cox regression analysis. The variables retained as significant in the multivariate model were incorporated into the nomogram.
Three-year and 5-year OS rates were the primary endpoints of this study. To test prognostic accuracy, the nomogram model was assessed by adequate discrimination and calibration with bootstrapping sampling in the validation set [16]. Discrimination was quantified by Harrell's concordance index (C-index), in which a value close to absolute 1 indicated a strong predictive ability of the nomogram model. Calibration plots based on Hosmer-Lemeshow goodness-of-fit test were developed to evaluate the predictive accuracy and show the concordance between the predicted and the observed ongoing survival probabilities of patients. In addition, patients were assigned to different risk categories based on their predicted risk scores. Kaplan-Meier survival curves by risk group and other significant predictive factors were calculated and compared using the log-rank test to determine the discriminative significance of the nomogram developed in the study. A 2-sided P < 0.05 was considered significant. All statistical analyses were performed using R version 3.5.1 software with the "rms", "foreign" and "survival" packages.

Baseline characteristics of patients
During the years 2004 through 2014, 3048 patients with LCNEC were identified from the SEER database. Of these patients, 2138 and 910 were assigned as the training and validation cohorts, respectively. Table 1 shows the descriptive statistics for the patients in the training and validation cohorts. In general, patients in the training and validation cohorts were comparable in terms of demographic and clinicopathological features. More than half of the patients were men (training vs validation sets, 53.37% vs 56.26%). More than 70% of the cases were diagnosed at age older than 60 years. The majority of the patients were whites (training vs validation, 84.52% vs 84.72%). The most common primary sites were the upper lobe (training vs validation, 58.75% vs 59.34%), and lower lobe (training vs validation, 27.22% vs 25.49%). Almost half of the patients' disease were of unknown grade (training vs validation, 49.20% vs 46.37%), followed by poorly differentiated grade (training vs validation, 36.58% vs 39.78%). The most frequent tumor size was �50 mm, in 32.60% and 33.19% of those in the training and validation cohorts, respectively. More than 40% of the patients in both cohorts had no separate tumor nodules in the ipsilateral lung. In both cohorts, more than 50% received primary site surgery, and fewer than 10% received surgery to other regional and distant sites. Table 2 shows the univariate analysis of demographic and clinicopathological factors investigated and final multivariate results for OS of LCNEC patients in the training cohort. In the univariate analysis, significant variables for OS included age at diagnosis (P < 0.001), gender (P < 0.001), primary site (P < 0.001), laterality (P = 0.017), tumor stage (P < 0.001), T Stage (P < 0.001), N stage (P < 0.001), M stage (P < 0.001), tumor size (P < 0.001), separate tumor nodules in the ipsilateral lung (P < 0.001), surgery to other regional and distant sites (P < 0.001), and surgery of primary site (P < 0.001).

Univariate and multivariate analysis of prognostic factors in the training set
These prognostic factors of statistical significance in the univariate models were included in the multivariate analysis. The factors that retained significance were age at diagnosis (P < 0.001), gender (P < 0.001), tumor stage (P < 0.05), N stage (P < 0.05), tumor size (P < 0.05), and surgery of primary site (P < 0.001).  contributing to LCNEC prognosis were tumor stage, age at diagnosis, and surgery of primary site. Score assignment to each variable included in the nomogram is provided in Table 3. A total score was computed by summing individual score according to demographic and clinical features of individual patients and the patient's probability of 3-and 5-year OS was obtained from the nomogram (Fig 1). The C-index values for OS prediction were 0.75 (95% CI, 0.74-0.76) and 0.76 (95% CI, 0.74-0.77) in the training and validation sets, respectively. The calibration plots for OS probability at 3 and 5 years showed that the concordance between the predicted and observed survival was optimal in both cohorts (Fig 2).

Kaplan-Meier analyses
In the training cohort with a median (range) follow-up of 65.5 (1-143) months, Kaplan-Meier analysis revealed a median OS of 13 (95% CI, [12][13][14] months. In the validation cohort with a median (range) follow-up of 56 (1-143) months, the median OS of LCNEC was 12 (95% CI, 10-15) months. We predicted a risk score based on the independent prognostic factors that were determined significant in the multivariate analysis. Patients with LCNEC were subsequently apportioned to high-and low-risk groups, according to the median risk score of 1.275. Fig 3A shows the Kaplan-Meier curves by the risk score group and the results clearly indicated that this risk score was capable of distinguishing OS of patients with LCNEC (P < 2e -16 ). In addition, the Kaplan-Meier curves showed worse OS for patients with LCNEC at advanced age (>80 years), male gender, large tumor size (20-29, 40-49, and �50 mm), lymph node  metastasis, advanced stage (stage II-IV vs I), and not receiving surgery of the primary site ( Fig  3B-3G).

Discussion
We used SEER data to develop and validate a novel nomogram model for predicting OS for patients with LCNEC. To our best knowledge, this prognostic nomogram is the first developed for pulmonary LCNEC. The nomogram incorporated factors that were identified in a multivariate Cox analysis as independent prognostic factors for LCNEC, specifically, age at diagnosis, gender, tumor stage, N stage, tumor size, and surgery of primary site. The nomogram model exhibited high discriminative accuracy in the training cohort (C-index = 0.75) which was confirmed in the validation cohort (C-index = 0.76). In addition, the calibration plots confirmed good concordance for the prediction of OS at 3 and 5 years in both cohorts, suggesting excellent performance of this nomogram for estimating LCNEC prognosis. We took a population-based approach using the SEER program to develop the nomogram. SEER collects incident cancer cases from cancer registries representing approximately 28% of the United States population [17]. Because of the rarity of LCNEC, this nomogram study would be impossible if based on cases of a single institution or multiple institutions. Furthermore, SEER is the only population-based program in the United States that provides follow-up information and comprehensively documents clinical data from medical records, including stage of cancer at diagnosis, grade, and therapy [18]. Given the robustness and completeness of the SEER database, the nomogram developed in this study can be expected to represent patients in the United States, with potential universal application for all patients with LCNEC. Currently, the criterion for assessing prognosis of neuroendocrine tumors is the TNM staging system, recommended by the IASLC [1,7]. However, the effectiveness of this system is unclear; results from small studies have showed conflicting results regarding the predictors for prognosis of LCNEC. LCNEC histology is in general related to worse OS, but even for stage I LCNEC, patients receiving adjuvant chemotherapy achieved better OS than those receiving surgery alone [19]. In another single-institution retrospective study, the survival benefit of adjuvant therapy was apparent in patients with LCNEC stages II or higher, but negligible in patients with stage I [20]. However, due to limited sample size and the lack of generalizability of these studies, no definitive and sound conclusion has been drawn.
With access to the SEER data, our study comprises the largest population of LCNEC ever studied. The multivariate analysis indicated that the tumor features significantly associated with patient survival were tumor stage and N stage. The classic T and M stages failed to attain Survival risk prediction of pulmonary large cell neuroendocrine carcinoma independent prognostic significance. Tumor stage spread through the full range of the point axis and contributed the most points in the nomogram, suggesting a more significant influence of tumor stage on LCNEC prognosis. Our study clearly suggests that the traditional TNM staging system may be insufficient for predicting prognosis of LCNEC. Surprisingly, we failed to find a significant effect on survival for histologic grade. This may be related to the high proportion of patients with unknown tumor grade.
Age at diagnosis was second to tumor stage to extend across most range of the point axis in the nomogram. Both the univariate and multivariate analyses showed that being 80 years and older was associated with a poorer OS compared with younger ages; the OS of patients aged 70 to 79 years was similar to that of younger patients. This is somewhat inconsistent with previous studies. In a recent retrospective study of LCNEC, the factors found to significantly contribute to poor survival were old age (>70 years), male gender, white race, and larger tumors (>30 mm) [9]. Older age (median, 65 years) has also been reported in other studies as a predictor of poor prognosis [4,21]. Our study also showed that tumor size larger than 50 mm was an independent prognostic factor associated with poor survival in patients with LCNEC. We also showed, for the first time, that male patients had poor OS, and LCNEC is often associated with male gender. Although exposure to heavy smoking is a possible prognostic factor for LCNEC, we were unable to assess its possible contribution to LCNEC survival because SEER does not collect smoking data on individual patients. Future large-scale studies are needed to address this topic. LCNEC is rare, and the available evidence is insufficient to tailor an optimal treatment plan specified for patients with LCNEC. Primary surgery remains the standard treatment for patients with stage I-II disease, but not all patients with stage I and II LCNEC benefit from surgery. In fact, the 5-year OS of patients with stage I disease who receive only surgery is low [22]. In the present multivariate analysis, LCNEC treated with surgical resection of the primary site was protective, and associated with better OS independently from other variables.
Metastatic patterns and their prognostic value in patients with LCNEC have been investigated in several previous studies, and the most common distant metastatic sites are the bone, liver, brain, and lung [22][23][24]. Metastasis to distant sites implies poor prognosis, and therefore special attention should be given to clinical management of distant metastases. Our analysis showed that OS in patients with surgery to distant sites was worse compared with those with no surgery to a distant site; the significance disappeared in the multivariate analyses. However, we could not exclude the possibility of lack of significance due to small sample size, especially in the surgery group.
The nomogram provides a personalized estimate of survival. Clinicians can use the total points provided by the nomogram to distinguish high-risk individuals from the general patient population, and pay closer attention during follow-up visits. High-risk patients could be selected to receive more aggressive treatment, or adjustments in treatment in response to changes in tumor features, or a recommendation for clinical trials of systemic therapy. In addition, palliative care service that includes symptom control and psychological support would benefit these patients at high risk for poor prognosis. This nomogram is a useful tool to identify patient subgroups with homogeneous OS within the LCNEC group and potentially assist personalized therapy. Additionally, this nomogram may be used as a prognostic tool to better counsel patients.
This study has several limitations. First, the nomogram was constructed based on the clinicopathological characteristics collected from the SEER database, and thus may not be a comprehensive prediction model for LCNEC prognosis. Mutational landscape differences have been observed between and within histological subtypes of lung cancer and have challenged the traditional histological classification [25]. Incorporating mutational landscape into the nomogram to improve the accuracy of predicting prognosis of LCNEC is promising, but the evidence is preliminary. Some studies advocate for further classification of LCNEC into mutually exclusive subtypes based on mutational signatures [26,27]. This is supported by recent Survival risk prediction of pulmonary large cell neuroendocrine carcinoma studies indicating the potential role of mutational signatures in predicting therapeutic response to chemotherapy and prognosis for LCNEC [28,29]. In this study, we could not provide more insight into the mutational landscape related to LCNEC prognosis, because currently SEER does not capture information pertaining to tumor genetic signatures. The diagnosis of LCNEC requires confirmation of neuroendocrine differentiation, which is recognized by positive immunohistochemical (IHC) stains for CD56, chromogranin A, and synaptophysin. These neuroendocrine biomarkers have shown potential prognostic value in patients with lung cancer [30]. Similarly, since information regarding IHC profiles of these markers is not provided in the SEER database, evaluation and incorporation of these markers into the nomogram was not possible. Additionally, the lack of some clinicopathological data in the SEER database, such as smoking status, comorbidities, family history of cancer, and performance score, hampered our ability to assess these features in relation to LCNEC prognosis. Second, SEER does not contain specific details on treatment regimens, which limited our ability to evaluate further the effect of treatment on LCNEC survival. Third, to validate nomograms, both internal and external validation sets are recommended, but only internal validation was applied in this study. This may weaken the generalizability of the results [11,31]. Therefore, before this nomogram can be implemented in a clinical setting, additional validation in an independent patient population is needed. Finally, the retrospective nature of SEER data may create a selection bias. Regardless of these inherent limitations, it is generally accepted that data in the SEER database is high quality, and SEER is the most comprehensive database possible for the objective of the current study.

Conclusion
In this study a novel nomogram was developed and validated based on only six common demographic and clinicopathological variables. The nomogram can be used to individualize prediction of OS for LCNEC. This should facilitate clinical decision making at individualized level. More studies are needed to verify the generalizability of this nomogram, and for improvements that might incorporate the factors that could not be investigated in the present study.