Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Combining Clinical, Pathological, and Demographic Factors Refines Prognosis of Lung Cancer: A Population-Based Study

  • Joseph Putila,

    Affiliations Mary Babb Randolph Cancer Center, West Virginia University, Morgantown, West Virginia, United States of America, Department of Community Medicine, West Virginia University, Morgantown, West Virginia, United States of America

  • Scot C. Remick,

    Affiliations Mary Babb Randolph Cancer Center, West Virginia University, Morgantown, West Virginia, United States of America, Department of Medicine, West Virginia University, Morgantown, West Virginia, United States of America

  • Nancy Lan Guo

    lguo@hsc.wvu.edu

    Affiliations Mary Babb Randolph Cancer Center, West Virginia University, Morgantown, West Virginia, United States of America, Department of Community Medicine, West Virginia University, Morgantown, West Virginia, United States of America

Combining Clinical, Pathological, and Demographic Factors Refines Prognosis of Lung Cancer: A Population-Based Study

  • Joseph Putila, 
  • Scot C. Remick, 
  • Nancy Lan Guo
PLOS
x

Correction

4 Mar 2011: Putila J, Remick SC, Guo NL (2011) Correction: Combining Clinical, Pathological, and Demographic Factors Refines Prognosis of Lung Cancer: A Population-Based Study. PLOS ONE 6(3): 10.1371/annotation/506974e6-6764-462c-b7d3-3cbd269c3d6c. https://doi.org/10.1371/annotation/506974e6-6764-462c-b7d3-3cbd269c3d6c View correction

Abstract

Background

In the treatment of lung cancer, an accurate estimation of patient clinical outcome is essential for choosing an appropriate course of therapy. It is important to develop a prognostic stratification model which combines clinical, pathological and demographic factors for individualized clinical decision making.

Methodology/Principal Findings

A total of 234,412 patients diagnosed with adenocarcinomas or squamous cell carcinomas of the lung or bronchus between 1988 and 2006 were retrieved from the SEER database to construct a prognostic model. A model was developed by estimating a Cox proportional hazards model on 500 bootstrapped samples. Two models, one using stage alone and another comprehensive model using additional covariates, were constructed. The comprehensive model consistently outperformed the model using stage alone in prognostic stratification and on Harrell's C, Nagelkerke's R2, and Brier Scores in the whole patient population as well as in specific treatment modalities. Specifically, the comprehensive model generated different prognostic groups with distinct post-operative survival (log-rank P<0.001) within surgical stage IA and IB patients in Kaplan-Meier analyses. Two additional patient cohorts (n = 1,991) were used as an external validation, with the comprehensive model again outperforming the model using stage alone with regards to prognostic stratification and the three evaluated metrics.

Conclusion/Significance

These results demonstrate the feasibility of constructing a precise prognostic model combining multiple clinical, pathologic, and demographic factors. The comprehensive model significantly improves individualized prognosis upon AJCC tumor staging and is robust across a range of treatment modalities, the spectrum of patient risk, and in novel patient cohorts.

Introduction

Lung cancer is one of the most aggressive cancer types and consistently the leading cause of cancer-related death in the United States for both men and women. There are around 215,000 new cases and 161,000 deaths annually [1]. Non-small cell lung cancer (NSCLC) accounts for about 80% of lung cancer cases. Although tumor stage is strongly predictive of survival in most cases, it does not explain the distinct variability in treatment outcome within patients of the same stage. Currently, surgery is the major treatment option for patients with stage I NSCLC. However, 35–50% of stage I NSCLC patients will relapse within five years [2], [3], which is the major cause of treatment failure, i.e. death from lung cancer. It remains an unsolved challenge for physicians to reliably identify patients at high risk for tumor recurrence as candidates for adjuvant chemotherapy.

Recent studies have utilized a variety of information in addition to tumor stage for prognostic stratification and prediction of treatment outcome [4][12]. Prognostic factors such as age, gender, and tumor grade, have been shown to be strongly associated with survival. Age is a well established risk factor for the development of lung cancer and can also influence the type of treatment received either due to medical coverage or the existence of co-morbid conditions which preclude certain therapies [13], [14]. Males diagnosed with lung cancer consistently experience poorer survival than do females [15]. This gender difference persisted even when controlling for other variables such as tumor stage, age at diagnosis, and treatment.

Race has also been shown to be a significant predictor of survival, with Asians and Pacific Islanders experiencing better survival in both prospective [16] and population-based studies [17]. While the disease mechanism and genetic background is not well characterized, the consistency of this finding is useful in terms of prognostication and treatment.

The emerging use of genetic markers may enable physicians to make treatment decisions based on the specific characteristics of individual patients and their tumors, instead of population statistics [18]. This study presents an alternative avenue to improve personalized prognosis of NSCLC by combining clinical, pathological, and demographic factors in a population-based study (n = 234,412). This comprehensive model was tested across a number of treatment modalities and blindly validated on multiple separate patient cohorts (n = 1,991). The comprehensive model achieved a significant improvement in prognostication when compared with AJCC tumor staging system including cases converted to AJCC 7th Edition [19]. This patient stratification scheme could be integrated with future clinically-validated prognostic gene signatures for personalized prognosis of NSCLC.

Methods

Acquisition of Patient Cohorts

A cohort of patients diagnosed with lung cancer was retrieved from the Surveillance Epidemiology and End Results (SEER) database [20]. The SEER database is an aggregate of registry data from specific geographic areas covering approximately 26 percent of the U.S. population, and contains clinical, demographic, treatment, and follow-up information for a variety of cancers. The requirements for inclusion in the study included a diagnosis of primary lung adenocarcinoma (ICD-O-3 8140 to 8380) or squamous cell carcinoma (ICD-O-3 8050 to 8080) between the years 1988 and 2006, as well as available data on tumor stage, tumor grade, race, age, gender, disease-specific survival, and treatment. Patients who were diagnosed via autopsy or death certificate, or had no valid survival data were excluded from the analysis. A total of 234,412 patients met the inclusion criteria. Patients staged using the 6th edition of AJCC staging, in general 2004 and newer diagnoses, were recoded to the 7th edition based on the proposed staging changes in the AJCC Staging Manual [19] and information about tumor size, extension, metastasis, and lymph node involvement found in the SEER database where possible. A total of 58,634 cases were able to be converted from the 6th to the 7th edition.

Two additional patient cohorts were also used as validation sets. De-identified data for a total of 1,552 patients treated at the Mary Babb Randolph Cancer Center at West Virginia University from 1990 to 2009 with squamous cell carcinoma (n = 758) or adenocarcinoma (n = 794) were obtained. The study was approved with an IRB exemption from West Virginia University. According to HIPAA regulation, de-identified clinical information can be used in research without prior consent from the patients. A total of 439 lung adenocarcinoma cases were also obtained from Shedden et al [21] for patients with Stage I-IIIB cancers. These patients were treated in H. Lee Moffitt Cancer Center, University of Michigan Comprehensive Cancer Center, Dana-Farber Cancer Institute, and Memorial Sloan-Kettering Cancer Center. Patients have provided consent. These data have been published in Shedden et al [21] before. It is not clear if patients have provided written or verbal consent. The protocols were approved with Institutional Review Boards (IRB-Med) of the respective institutes.

Conversion of Cases to AJCC 7th Edition

Cases diagnosed from 2004 onward were able to be converted into the AJCC 7th Edition. The original TNM staging information regarding tumor size and extension (T), lymph node status (N), and distant metastasis (M) was retrieved from the SEER data. Using this information, the T, N, and M classifiers were recoded according to the new guidelines [19] and then used to determine the AJCC 7th Edition stage.

Model Construction and Statistical Analyses

Disease-specific survival was analyzed primarily using a Cox proportional hazards model. This model estimates the effect of covariates on the time until an event, in this case death, following a diagnosis. Four models, one for each of the histology and AJCC staging combinations, were estimated. A total of 500 bootstrapped samples equal in size to the original adenocarcinoma and squamous cell carcinoma patient cohorts were constructed. This method has been seen to be superior to split-sample techniques [22], and in general produces less biased estimates with a smaller variance. A Cox model was then fit on each bootstrapped sample. In order to determine the advantage of using other covariates in addition to AJCC stage, two sets of covariates were used in the model evaluation. The first contained information on tumor stage and grade, patient age, race, and gender. The second contained only information on tumor stage and was used as a model of current clinical practice. The final training model used the mean value of all coefficients generated from the bootstrapped samples, as the distribution of hazard scores was normal. Hazard scores were calculated for each patient in the original samples based on the final model constructed from the means. The formula used to specify the model is shown below, demonstrating the relationship between hazard h for patient i at time t and the coefficients, β, for covariates 1 through k with values of x.

In the prognostic categorization, cutoff values were defined from the bootstrapped samples to stratify patients into a high-, low-, or intermediate-risk group based on their individual hazard scores. The Cox-model and cutoff values were applied to the original cohort for validation. The prognostic categorization was evaluated with the Kaplan-Meier survival function, where the estimated proportion surviving S at any time t is equal to the proportion of non-censored cases n surviving interval i less the number of deaths d in that interval, as in the following formula:

Patients still alive or dead due to unrelated causes were censored at the time of last follow-up or death, respectively. Internal performance was measured using Harrell's C, Nagelkerke's R2, and Brier Scores. Harrell's C is a measure of concordance which is representative of the area under an ROC curve ranging between 0 and 1, with higher scores indicating greater concordance [23]. The ROC curves were used in model evaluation with the pROC package in R. The statistical significance (P-value) of the difference between the areas under the curves was calculated using the Delong method in the same package. A larger area in this case demonstrates an improved predictive ability. Nagelkerke's R2 is functionally similar to the R2 value in linear models, ranging between 0 and 1 with higher values explaining more variance, with this variant being calculated on the log-likelihood scale. The Brier score represents the average prediction error, ranging from 1 to 0, with lower values indicating a lower average error. Significance of risk-group stratification was determined using a log-rank test of the Kaplan-Meier function. The log-rank test uses contingency tables at each observation period to determine if a significant difference exists between two survival functions. The model constructed using the training set was then further validated on SEER sub-cohorts as well as patients from the MBRCC and the Director's Challenge cohorts [21], without re-estimating parameters of the model or cutoffs. Statistical analyses were conducted with the pamr, pec, Design, and survival packages in R v2.11.0.

Results

This study focused on two major cell types of NSCLC, lung adenocarcinoma and squamous cell carcinoma. For each cell type, a comprehensive model was constructed to include the previous AJCC staging system (the 3rd and 6th editions) and the current AJCC 7th edition. The clinical characteristics of the SEER patient population are listed in Table 1, and two external validation cohorts are summarized in Table 2. The bootstrapped model was used to generate a hazard score of each patient in the test data as a blinded validation. The previously determined parameters and cutoffs were used to stratify patients in the original cohort into the three risk groups based on the hazard score of each patient. The prognostic categorization of the comprehensive model was compared with multiple editions of the AJCC staging system. Specifically, the low-risk group defined by the comprehensive model was compared with AJCC stage I; the intermediate-risk group was compared with AJCC stage II and IIIA; whereas the high-risk group was compared with AJCC stage IIIB/IV. Significantly longer survival in the low-risk group or significantly poorer survival in the high-risk group was considered to be an improvement in prognostication using the comprehensive model. The models were constructed by taking the mean of each coefficient from a Cox model fit on 500 bootstrapped samples of each original cohort. This resulted in a total of four models, one for each of the two AJCC staging systems combined with two major NSCLC cell types. These models were tested on the original samples in their entirety, sub-cohorts representative of four major treatment modalities, and two external cohorts.

thumbnail
Table 1. Outline of patient clinical characteristics for major histology of non-small cell lung cancer and AJCC staging editions retrieved from SEER database.

https://doi.org/10.1371/journal.pone.0017493.t001

thumbnail
Table 2. Outline of patient clinical characteristics for external non-small cell lung cancer validation sets.

https://doi.org/10.1371/journal.pone.0017493.t002

In the overall studied patient population, earlier stage at diagnosis was significantly related to disease-specific survival in a univariate Cox Proportional Hazards model in both adenocarcinoma and squamous cell carcinoma for each AJCC Staging system (P<0.05). In the multivariate analyses AJCC stage, tumor grade, patient age, race, and gender were all significant. Specifically, lower tumor grade, younger age at diagnosis, and being of Asian/Pacific Islander descent were all significantly associated with improved survival (P<0.05). Being male or having a later stage at diagnosis was associated with a poorer outcome across all groups. The comprehensive model incorporating all these factors showed significantly improved prognostic categorization when compared with the AJCC staging system, including the latest edition which is detailed below.

The patients were then assigned into one of four treatment categories based on the treatment record in the SEER database. These categories were surgery, radiation, surgery with radiation, and no treatment listed. For simplicity, this determination was based on the presence or absence of any surgical or radiation procedure, regardless of the specific procedure.

Patient stratification for lung adenocarcinoma (the AJCC 3rd and 6th edition)

A total of 150,158 lung adenocarcinoma patients staged with the 3rd and 6th AJCC Editions met the criteria for inclusion. Harrell's c statistic was calculated for both the model using stage alone and the comprehensive model using additional covariates. The comprehensive model had a higher C statistic (0.732) compared to the stage only model (0.694), as well as showing better prediction of 5-year survival after the initial treatment in ROC curves (P<0.0001, Fig. 1A). A similar improvement was seen for Nagelkerke's R2 (0.294 vs. 0.253) and Brier score (0.134 vs. 0.143).

thumbnail
Figure 1. Prediction of survival at 60 months for the AJCC 3rd and 6th Editions (top) and 30 months for the cases converted to the AJCC 7th Edition (bottom) for both lung adenocarcinoma (left) and squamous cell lung cancer (right) using ROC curves.

P<0.05 indicates that the full model is significantly more accurate in predicting disease-specific survival than tumor stage.

https://doi.org/10.1371/journal.pone.0017493.g001

The analysis comparing the performance of each model on treatment subgroups also showed a similar improvement in predictive ability with the comprehensive model. In patients that received surgery without radiation, the comprehensive model had consistently better estimates for Harrell's C (0.768 vs. 0.723), Nagelkerke's R2 (0.225 vs. 0.173) and Brier Score (0.206 vs. 0.210). A similar improvement, summarized in Tables 3, 4 and 5, was observed in patients receiving radiation without surgery, surgery with radiation, and those with no treatment listed.

thumbnail
Table 3. Harrell's C-statistics from each model for each of the patient cohorts, separated into AJCC coding system, treatment modality, and histology where possible.

https://doi.org/10.1371/journal.pone.0017493.t003

thumbnail
Table 4. Nagelkerke's R2 values from each model for each of the patient cohorts, separated into AJCC coding system, treatment modality, and histology where possible.

https://doi.org/10.1371/journal.pone.0017493.t004

thumbnail
Table 5. Brier Scores from each model for each of the patient cohorts, separated into AJCC coding system, treatment modality, and histology where possible.

https://doi.org/10.1371/journal.pone.0017493.t005

The low-risk group predicted by the comprehensive model survived significantly longer than stage I patients, with an average survival of 69.6 versus 57.2 months (log-rank P<0.0001). In addition, the high-risk group predicted by the comprehensive model had a significantly poorer survival than the stage IIIB/IV patient group, with an average survival of 5.6 months compared to 11.9 months (log-rank P<0.0001) as shown in Fig. 2C and 2D.

thumbnail
Figure 2. Results of survival analysis on lung adenocarcinoma patients staged using AJCC 3rd or 6th Edition.

a) Histogram of Hazard Scores obtained from the comprehensive model. b) Probability of death from lung cancer prior to 24 months based on Hazard Scores calculated using the comprehensive model. c) Kaplan-Meier survival plots for low-, intermediate-, and high-risk groups determined by the comprehensive model (blue) and AJCC staging alone (orange). d) Average survival of each group in months, with log-rank P-values shown. L: low-risk; Int: intermediate-risk; H: high-risk defined by the full model. Stage only model contains patient with stage 1, 2, 3a, 3b and 4. e) Kaplan-Meier survival plots for each risk group in patients who received surgery without radiation. f) Average survival for risk groups in patients who received surgery without radiation. L: low-risk; Int: intermediate-risk; H: high-risk. Stage only model contains patient with stage 1, 2, 3a, 3b and 4.

https://doi.org/10.1371/journal.pone.0017493.g002

For lung adenocarcinoma patients who received surgery without radiation, the comprehensive model was able to improve upon the prognostic ability of AJCC staging for low-risk patients with an average survival of 72.4 versus 62.3 months (log-rank P<0.0001). Patients in the high-risk group had an average survival of 13.3 versus 30.6 months for the comprehensive and stage alone models, respectively (log-rank P<0.0001). The intermediate-risk group defined by the comprehensive model showed significantly better prognosis than stage II and III patients (log-rank P<0.0001; Fig. 2E and 2F). Similar results were observed for patients receiving other treatment options (results not shown). Specifically, for patients who received both surgery and radiation, radiation without surgery, or no treatment, the comprehensive model could identify patients at higher risk as candidates for adjuvant chemotherapy, whereas it might spare low-risk patients from unnecessarily aggressive treatment.

Lung adenocarcinoma cases converted to the AJCC 7th edition

A total of 38,426 lung adenocarcinoma cases were converted into the AJCC 7th edition. It is important to note that the converted cases represent a much smaller cohort and have shorter follow-up time compared to the AJCC 3rd and 6th Edition cohorts. When considering the entire patient sample, Harrell's C for the comprehensive model versus the stage only model (0.763 vs. 0.731), prediction of survival at 30 months (P<0.0001, Fig. 1C), Nagelkerke's R2 (0.305 vs. 0.274) and Brier score (0.144 vs. 0.150) were all improved. These effects persisted when considering the four patient sub-cohorts defined by treatment modality, although the performance of both models was similarly decreased when compared to the original staging system. The patient sub-cohort with no treatment listed performed the worst on all three metrics. An improvement in the prognostic categorization similar to that observed in the unconverted cases (the AJCC 3rd and 6th staging) was found for the overall population and specific treatment modalities (Fig. 3). When considering all treatments the low-risk group predicted by the comprehensive model had an average survival of 16.4 months compared to 15.3 months for stage I of the AJCC 7th edition (log-rank P<0.0001). Prediction of the high-risk group was also significantly improved with an average survival of 2.0 months for the comprehensive model and 3.6 months for stage IIIB/IV (log-rank P<0.0001).

thumbnail
Figure 3. Results of survival analysis on lung adenocarcinoma patients converted to AJCC 7th Edition.

a) Histogram of Hazard Scores obtained from the comprehensive model. b) Probability of death from lung cancer prior to 24 months based on Hazard Scores calculated using the comprehensive model. c) Kaplan-Meier survival plots for low-, intermediate-, and high-risk groups determined by the comprehensive model (blue) and AJCC staging alone (orange). d) Average survival of each group in months, with log-rank P-values shown. e) Kaplan-Meier survival plots for each risk group in patients who received surgery without radiation. f) Average survival for risk groups in patients who received surgery without radiation. L: low-risk; Int: intermediate-risk; H: high-risk defined by the full model. Stage only model contains patient with stage 1, 2, 3a, 3b and 4.

https://doi.org/10.1371/journal.pone.0017493.g003

For lung adenocarcinoma patients who received surgery without radiation, the comprehensive model significantly improved prognostication in the low-risk group (16.5 versus 16.0 months, log-rank P<0.0001). The high-risk group had an average survival of 4.3 months for the comprehensive model and 8.9 months for stage IIIB/IV (log-rank P<0.0001)). The comprehensive model was also able to improve prognostication for both the high and low-risk groups in patients that received both surgery and radiation or no treatment (P<0.05), and in the high-risk group for patients receiving radiation without surgery (P<0.0001). Prognostication using the comprehensive model matched or improved non-significantly upon the stage only model in the patient samples which did not achieve significance (results not shown).

Prognostication of squamous cell lung cancer (the AJCC 3rd and 6th edition)

A total of 84,254 squamous cell lung cancer patients diagnosed with the ACC 3rd and 6th staging system met the inclusion criteria. Performance of both the comprehensive and stage only models were slightly decreased when compared to the adenocarcinoma patients in the overall patient sample. However, there was still an improvement in the overall treatment cohort when using the comprehensive model on Harrell's C (0.722 vs. 0.706), prediction of 5-year survival in ROC curves (P<0.0001Fig. 1B), Nagelkerke's R2 (0.289 vs. 0.274), but not on Brier score (0.119 vs. 0.119). There was a similar improvement in the sub-cohorts defined by treatment modality, with the comprehensive model performing as well or better than the stage only model in all sub-cohorts. In the overall cohort, the low-risk group defined by the comprehensive model had an average survival of 51.3 months versus 45.7 months in stage I squamous cell lung cancer (log-rank P<0.0001). The high-risk group had an average survival of 1.7 months versus 4.7 months in stage IIIB/IV patients (log-rank P<0.0001).

Similar results were found when comparing only those who received surgical treatment, with the low-risk group predicted by the comprehensive model surviving an average of 58.2 months compared to 55.3 months for stage I patients (log-rank P<0.0001), and the high-risk group surviving an average of 1.2 versus 9.3 months in stage IIIB/IV patients (log-rank P<0.0001; Fig. 4E and 4F). Similar results were also observed for squamous cell lung cancer patients who received surgery and radiation, radiation without surgery, and no treatment (results not shown) with the comprehensive model improving prognostication among high-risk patients in all three samples (log-rank P<0.05), and in low-risk patients for those receiving surgery with radiation or no treatment (log-rank P<0.05).

thumbnail
Figure 4. Results of survival analysis on squamous cell lung cancer patients staged using AJCC 3rd or 6th Edition.

a) Histogram of Hazard Scores obtained from the comprehensive model. b) Probability of death from lung cancer prior to 24 months based on Hazard Scores calculated using the comprehensive model. c) Kaplan-Meier survival plots for low-, intermediate-, and high-risk groups determined by the comprehensive model (blue) and AJCC staging alone (orange). d) Average survival of each group in months, with log-rank P-values shown. e) Kaplan-Meier survival plots for each risk group in patients having received surgery without radiation. f.) Average survival for risk groups in patients who received surgery without radiation. L: low-risk; Int: intermediate-risk; H: high-risk defined by the full model. Stage only model contains patient with stage 1, 2, 3a, 3b and 4.

https://doi.org/10.1371/journal.pone.0017493.g004

Squamous cell lung cancer cases converted to the AJCC 7th edition

A total of 20,208 squamous cell lung cancer cases could be converted to the AJCC 7th edition. Prediction was similar or improved when using the comprehensive model on all three metrics and in all treatment cohorts considered, however the difference between the two models was marginal in some cases. The most marked improvement in prediction was in the sub-cohort of patients receiving surgery without radiation. For that group, the comprehensive model outperformed the stage only model on Harrell's C (0.689 vs. 0.670), prediction of survival at 30 months (P<0.0001, Fig. 1D), Nagelkerke's R2 (0.064 vs. 0.055), and marginally on Brier score (0.113 vs. 0.114).

The low-risk group predicted by the comprehensive model survived an average of 14.7 months, representing a significantly better prognosis than average survival of 13.7 months in stage I patients (log-rank P<0.0001). The high-risk group had an average of 1.8 versus 3.0 months when compared to stage IIIB/IV patients (log-rank P<0.0001).

In patients receiving surgery without radiation, the comprehensive model predicted an average survival of 15.7 months for the low-risk group versus 15.2 months for stage I (log-rank P = 0.0114). The average survival of the high-risk group did not differ significantly from that of stage IIIB/IV (P = 0.8764), due in part to the small sample size and short follow-up, although the comprehensive model showed a non-significant improvement of 5.0 versus 7.8 months. These results are summarized in Fig. 5. In patients treated with radiation without surgery or radiation with surgery, prognostic categorization was improved only in the high-risk group, with an average survival of 2.1 versus 3.2 months and 2.4 versus 6.1 months, respectively, compared to stage alone (log-rank P = 0.0136; results not shown).

thumbnail
Figure 5. Results of survival analysis on squamous cell lung cancer patients converted to AJCC 7th Edition.

a) Histogram of Hazard Scores obtained from the comprehensive model. b) Probability of death from lung cancer prior to 24 months based on Hazard Scores calculated using the comprehensive model. c) Kaplan-Meier survival plots for low-, intermediate-, and high-risk groups determined by the comprehensive model (blue) and AJCC staging alone (orange). d.) Average survival of each group in months, with log-rank P-values shown. e) Kaplan-Meier survival plots for each risk group in patients who received surgery without radiation. f.) Average survival for risk groups in patients who received surgery without radiation. L: low-risk; Int: intermediate-risk; H: high-risk defined by the full model. Stage only model contains patient with stage 1, 2, 3a, 3b and 4.

https://doi.org/10.1371/journal.pone.0017493.g005

Treatment selection for stage I patients

Patients with stage I cancers who were treated with surgery without radiation were extracted for a further analysis to determine whether the comprehensive model could identify early-stage patients who may benefit from a more aggressive therapy. The stage I cohort was then further separated into stage IA and IB patients, with the coefficients from the comprehensive model being applied in order to test the ability of the additional factors to stratify a relatively homogenous set of patients. High and low-risk group membership was defined relative to the median hazard score for each cohort. For adenocarcinoma the comprehensive model was able to stratify stage IA and IB using both the 3rd and 6th Editions as well as the 7th Edition (log-rank P<0.0001) in Kaplan-Meier analyses (Fig. 6). In squamous cell carcinomas the comprehensive model was again able to significantly stratify stage IA and IB patients into high and low-risk groups with both AJCC staging schemes using the model developed on the entire SEER cohort without re-estimation of the parameters (log-rank P<0.0001; Kaplan-Meier analyses; Fig. 7). These results demonstrate that the comprehensive prognostic model was able to reliably identify stage I NSCLC patients at higher risk for tumor recurrence. These high risk patients should be considered for adjuvant chemotherapy.

thumbnail
Figure 6. Results of survival analysis on lung adenocarcinoma patients diagnosed with stage IA or IB disease.

The Kaplan-Meier plots show the difference between low- and high-risk groups as determined by the comprehensive model. Data on sub-stage was only available for patients staged using the AJCC 6th Edition staging system (2004 and later) and for those patients converted into the 7th Edition.

https://doi.org/10.1371/journal.pone.0017493.g006

thumbnail
Figure 7. Results of survival analysis on squamous cell lung carcinoma patients diagnosed with Stage IA or IB disease.

The Kaplan-Meier plots show the difference between low- and high-risk groups as determined by the comprehensive model. Data on sub-stage was only available for patients staged using the AJCC 6th Edition staging system (2004 and later) and for those patients converted into the 7th Edition.

https://doi.org/10.1371/journal.pone.0017493.g007

External Validation

The comprehensive model was also able to improve prognostication in the external validation sets from MBRCC and the Director's Challenge cohort [21]. Patients with both adenocarcinomas (n = 794) and squamous cell carcinomas (n = 758) with all tumor stages were available from the MBRCC cohort. The Director's Challenge cohort contained only lung adenocarcinoma patients with stage I, II, and III (n = 439). The comprehensive model performed consistently better across all three metrics considered when the training models estimated using the SEER cohort was applied to the cohorts from MBRCC and the Director's Challenge study, with the results being consistent across histology in the MBRCC cohort. The comprehensive model appeared to perform much better in the MBRCC cohort. These results are summarized in Tables 3, 4, and 5.

In the adenocarcinoma cohort from MBRCC, the comprehensive model was able to improve prognostication for the low-risk group (33 versus 24 months, P = 0.0170) and borderline significant for the high-risk groups (2.2 versus 2.8 months, P = 0.058). The addition of pathological and demographic factors could not significantly improve prognostication in the squamous cell carcinoma patients from the same set (P>0.05). In the Director's Challenge cohort which contained only adenocarcinomas, the comprehensive model was able to improve prognostication for the low-risk (42.6 versus 36.2 months) and the high-risk group (2.2 versus 9.2 months), although the results were not significant (P>0.05). These results are illustrated in Fig. 8.

thumbnail
Figure 8. Results of survival analyses performed on patient cohorts from the Director's Challenge Study and the Mary Babb Randolph Cancer Center at West Virginia University.

https://doi.org/10.1371/journal.pone.0017493.g008

Discussion

Substantial efforts have been made to establish prognostic factors for patients with lung cancer during the last two decades. The traditional prognostic factors are tumor size, vascular invasion, poor differentiation, high tumor-proliferative index, and genetic alterations, including K-ras [24], [25] and p53 [26]. With the development of molecular biotechnology, especially high-throughput microarrays, there have been a number of promising studies on lung cancer prognosis by transcriptional profiling [21], [27][35]. Although the traditional prognostic factors lack the information about the biological diversity of lung cancer and have not reflected the complexity of molecular mechanisms of these diseases, they are still the most valuable criteria for clinicians to decide the relevant therapies [36]. For instance, Adjuvant! (www.adjuvantonline.com) is a prognostic system for lung cancer, breast cancer, and colon cancer based on traditional pathological features, including age, tumor stage, and grade. It has been independently validated as a reliable aid to clinical decision-making on average breast cancer patients [37]. A study by Birim and others [38] also demonstrated that clinical factors such as respiratory function, comorbidity, and smoking behaviors in addition to tumor stage could be used to refine prognosis in a cohort of NSCLC patients (n = 766).

In this study, we sought to investigate the impact of clinical, pathological, and demographic factors on lung cancer survival using a population-based approach. It was found that the addition of pathological and demographic covariates to AJCC staging was able to significantly improve predictive ability in both lung adenocarcinomas and squamous cell carcinomas. These additional variables accounted for previously unexplained variation within and independent of tumor stage, and resulted in a more accurate assessment of the risk for treatment failure when evaluated as integrated prognostic indicators. This effect persisted within multiple treatment modalities.

The comprehensive model was able to improve prediction in stage I surgical adenocarcinoma patients, and was able to produce a significant stratification even within sub-stage IA and IB. Low-risk patients defined by the comprehensive model may not benefit from additional therapies while, conversely, those who are predicted as high-risk may benefit from adjuvant chemotherapy.

The comprehensive model demonstrated significant improvement in clinical prediction over the AJCC 7th staging edition despite smaller sample sizes and shorter follow-up. Furthermore, the external validation results indicate that the comprehensive prognostic model constructed from SEER population data could improve prognosis in multiple local hospitals. These findings show promise for a clinical model for more refined prognosis of NSCLC.

It is important to note that the analysis does not account for the varying quality of treatments between institutions. Median county income was used as a rough surrogate measure for this factor in an unpublished analysis. It was found that higher median county income was significantly associated with improved disease-specific survival, but was omitted from the prognostic model as it is not a prudent metric to guide personalized treatment. Removal of median income as a covariate did not have a significant impact on the overall results or the predictive ability of the model as a whole. An additional limitation of the study was the lack of information on the use of chemotherapy and co-morbidities present at the time of diagnosis [39]. It is expected that inclusion of data found in the linked SEER-Medicare database will more appropriately address these issues and allow for further refinement of the model. In future research, we plan to construct a comprehensive model to estimate treatment benefits of commonly used chemotherapies utilizing the SEER-Medicare data, and to partition patients according to a specific treatment approach. A web-based implementation of this model is currently under development, offering nomograms representing benefits for multiple treatment modalities. We envision that this model could be combined with future clinically validated gene signatures for a more refined assessment of patient risk of treatment failures for a variety of modalities.

Acknowledgments

We thank Dr. John Rogers and Dr. Barbara Ducatman at West Virginia University for thoughtful discussion. We thank Dajie Luo at West Virginia University for developing a web application for clinical decision making based on the comprehensive model presented in this study. We are grateful for Pamela Moats and April Feathers for retrieving patient data from Tumor Registry of Mary Babb Randolph Cancer Center.

Author Contributions

Conceived and designed the experiments: NLG SCR. Performed the experiments: JP. Analyzed the data: NLG JP. Contributed reagents/materials/analysis tools: NLG. Wrote the paper: NLG JP.

References

  1. 1. Jemal A, Siegel R, Ward E, Hao Y, Xu J, et al. (2008) Cancer statistics, 2008. CA Cancer J Clin 58: 71–96.
  2. 2. Hoffman PC, Mauer AM, Vokes EE (2000) Lung cancer. Lancet 355: 479–485.
  3. 3. Naruke T, Goya T, Tsuchiya R, Suemasu K (1988) Prognosis and survival in resected lung carcinoma based on the new international staging system. J Thorac Cardiovasc Surg 96: 440–447.
  4. 4. Dehing-Oberije C, Aerts H, Yu S, De RD, Menheere P, et al. (2010) Development and Validation of a Prognostic Model Using Blood Biomarker Information for Prediction of Survival of Non-Small-Cell Lung Cancer Patients Treated With Combined Chemotherapy and Radiation or Radiotherapy Alone (NCT00181519, NCT00573040, and NCT00572325). Int J Radiat Oncol Biol Phys.
  5. 5. Dehing-Oberije C, Yu S, De RD, Meersschout S, Van BK, et al. (2009) Development and external validation of prognostic model for 2-year survival of non-small-cell lung cancer patients treated with chemoradiotherapy. Int J Radiat Oncol Biol Phys 74: 355–362.
  6. 6. Dehing-Oberije C, De RD, van der WH, Hochstenbag M, Bootsma G, et al. (2008) Tumor volume combined with number of positive lymph node stations is a more important prognostic factor than TNM stage for survival of non-small-cell lung cancer patients treated with (chemo)radiotherapy. Int J Radiat Oncol Biol Phys 70: 1039–1044.
  7. 7. Werner-Wasik M, Swann RS, Bradley J, Graham M, Emami B, et al. (2008) Increasing tumor volume is predictive of poor overall and progression-free survival: secondary analysis of the Radiation Therapy Oncology Group 93-11 phase I-II radiation dose-escalation study in patients with inoperable non-small-cell lung cancer. Int J Radiat Oncol Biol Phys 70: 385–390.
  8. 8. Basaki K, Abe Y, Aoki M, Kondo H, Hatayama Y, et al. (2006) Prognostic factors for survival in stage III non-small-cell lung cancer treated with definitive radiation therapy: impact of tumor volume. Int J Radiat Oncol Biol Phys 64: 449–454.
  9. 9. Solan MJ, Werner-Wasik M (2003) Prognostic factors in non-small cell lung cancer. Semin Surg Oncol 21: 64–73.
  10. 10. Ball D, Smith J, Wirth A, Mac MM (2002) Failure of T stage to predict survival in patients with non-small-cell lung cancer treated by radiotherapy with or without concomitant chemotherapy. Int J Radiat Oncol Biol Phys 54: 1007–1013.
  11. 11. Brundage MD, Davies D, Mackillop WJ (2002) Prognostic factors in non-small cell lung cancer: a decade of progress. Chest 122: 1037–1057.
  12. 12. Bradley JD, Ieumwananonthachai N, Purdy JA, Wasserman TH, Lockett MA, et al. (2002) Gross tumor volume, critical prognostic factor in patients treated with three-dimensional conformal radiation therapy for non-small-cell lung carcinoma. Int J Radiat Oncol Biol Phys 52: 49–57.
  13. 13. Brown JS, Eraut D, Trask C, Davison AG (1996) Age and the treatment of lung cancer. Thorax 51: 564–568.
  14. 14. O'Rourke MA, Feussner JR, Feigl P, Laszlo J (1987) Age trends of lung cancer stage at diagnosis. Implications for lung cancer screening in the elderly. JAMA 258: 921–926.
  15. 15. Visbal AL, Williams BA, Nichols FC III, Marks RS, Jett JR, et al. (2004) Gender differences in non-small-cell lung cancer survival: an analysis of 4,618 patients diagnosed between 1997 and 2002. Ann Thorac Surg 78: 209–215.
  16. 16. Thatcher N, Chang A, Parikh P, Rodrigues PJ, Ciuleanu T, et al. (2005) Gefitinib plus best supportive care in previously treated patients with refractory advanced non-small-cell lung cancer: results from a randomised, placebo-controlled, multicentre study (Iressa Survival Evaluation in Lung Cancer). Lancet 366: 1527–1537.
  17. 17. Clegg LX, Li FP, Hankey BF, Chu K, Edwards BK (2002) Cancer survival among US whites and minorities: a SEER (Surveillance, Epidemiology, and End Results) Program population-based study. Arch Intern Med 162: 1985–1993.
  18. 18. Dalton WS, Friend SH (2006) Cancer biomarkers–an invitation to the table. Science 312: 1165–1168.
  19. 19. American Joint Committee on Cancer (2010) AJCC Cancer Staging Manual. New York: Springer-Verlag.
  20. 20. SEER (2010) Surveillance, Epidemiology, and End Results (SEER) Program (www.seer.cancer.gov) SEER*Stat Database: Incidence - SEER 9 Regs Research Data, Nov 2009 Sub (1973-2007) <Katrina/Rita Population Adjustment> - Linked To County Attributes - Total U.S., 1969-2007 Counties, National Cancer Institute, DCCPS, Surveillance Research Program, Cancer Statistics Branch, released April 2010, based on the November 2009 submission.
  21. 21. Shedden K, Taylor JM, Enkemann SA, Tsao MS, Yeatman TJ, et al. (2008) Gene expression-based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study. Nat Med 14: 822–827.
  22. 22. Steyerberg EW, Harrell FE Jr, Borsboom GJ, Eijkemans MJ, Vergouwe Y, et al. (2001) Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. J Clin Epidemiol 54: 774–781.
  23. 23. Steyerberg EW, Harrell FE Jr, Borsboom GJ, Eijkemans MJ, Vergouwe Y, et al. (2001) Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. J Clin Epidemiol 54: 774–781.
  24. 24. Rodenhuis S, van de Wetering ML, Mooi WJ, Evers SG, van ZN, et al. (1987) Mutational activation of the K-ras oncogene. A possible pathogenetic factor in adenocarcinoma of the lung. N Engl J Med 317: 929–935.
  25. 25. Slebos RJ, Kibbelaar RE, Dalesio O, Kooistra A, Stam J, et al. (1990) K-ras oncogene activation as a prognostic marker in adenocarcinoma of the lung. N Engl J Med 323: 561–565.
  26. 26. Horio Y, Takahashi T, Kuroishi T, Hibi K, Suyama M, et al. (1993) Prognostic significance of p53 mutations and 3p deletions in primary resected non-small cell lung cancer. Cancer Res 53: 1–4.
  27. 27. Beer DG, Kardia SL, Huang CC, Giordano TJ, Levin AM, et al. (2002) Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med 8: 816–824.
  28. 28. Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, et al. (2001) Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci U S A 98: 13790–13795.
  29. 29. Bild AH, Yao G, Chang JT, Wang Q, Potti A, et al. (2006) Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature 439: 353–357.
  30. 30. Borczuk AC, Kim HK, Yegen HA, Friedman RA, Powell CA (2005) Lung adenocarcinoma global profiling identifies type II transforming growth factor-beta receptor as a repressor of invasiveness. Am J Respir Crit Care Med 172: 729–737.
  31. 31. Chen HY, Yu SL, Chen CH, Chang GC, Chen CY, et al. (2007) A five-gene signature and clinical outcome in non-small-cell lung cancer. N Engl J Med 356: 11–20.
  32. 32. Potti A, Mukherjee S, Petersen R, Dressman HK, Bild A, et al. (2006) A genomic strategy to refine prognosis in early-stage non-small-cell lung cancer. N Engl J Med 355: 570–580.
  33. 33. Raponi M, Zhang Y, Yu J, Chen G, Lee G, et al. (2006) Gene expression signatures for predicting prognosis of squamous cell and adenocarcinomas of the lung. Cancer Res 66: 7466–7472.
  34. 34. Guo L, Ma Y, Ward R, Castranova V, Shi X, et al. (2006) Constructing molecular classifiers for the accurate prognosis of lung adenocarcinoma. Clin Cancer Res 12: 3344–3354.
  35. 35. Guo NL, Wan YW, Tosun K, Lin H, Msiska Z, et al. (2008) Confirmation of gene expression-based prediction of survival in non-small cell lung cancer. Clin Cancer Res 14: 8213–8220.
  36. 36. Ludwig JA, Weinstein JN (2005) Biomarkers in cancer staging, prognosis and treatment selection. Nat Rev Cancer 5: 845–856.
  37. 37. Olivotto IA, Bajdik CD, Ravdin PM, Speers CH, Coldman AJ, et al. (2005) Population-based validation of the prognostic model ADJUVANT! for early breast cancer. J Clin Oncol 23: 2716–2725.
  38. 38. Birim O, Kappetein AP, Waleboer M, Puvimanasinghe JP, Eijkemans MJ, et al. (2006) Long-term survival after non-small cell lung cancer surgery: development and validation of a prognostic model with a preoperative and postoperative mode. J Thorac Cardiovasc Surg 132: 491–498.
  39. 39. Firat S, Byhardt RW, Gore E (2002) Comorbidity and Karnofksy performance score are independent prognostic factors in stage III non-small-cell lung cancer: an institutional analysis of patients treated on four RTOG studies. Radiation Therapy Oncology Group. Int J Radiat Oncol Biol Phys 54: 357–364.