Figures
Abstract
The purpose of this study is to develop a computed tomography (CT) biomarker of emphysema that is robust across reconstruction settings, and evaluate its ability to predict mortality in patients at high risk for lung cancer. Data included baseline CT scans acquired between August 2002 and April 2004 from 1737 deceased subjects and 5740 surviving controls taken from the National Lung Screening Trial. Emphysema scores were computed in the original scans (origES) and after applying resampling, normalization and bullae analysis (normES). We compared the prognostic value of normES versus origES for lung cancer and all-cause mortality by computing the area under the receiver operator characteristic curve (AUC) and the net reclassification improvement (NRI) for follow-up times of 1–7 years. normES was a better predictor of mortality than origES. The 95% confidence intervals for the differences in AUC values indicated a significant difference for all-cause mortality for 2 through 6 years of follow-up, and for lung cancer mortality for 1 through 7 years of follow-up. 95% confidence intervals in NRI values showed a statistically significant improvement in classification for all-cause mortality for 2 through 7 years of follow-up, and for lung cancer mortality for 3 through 7 years of follow-up. Contrary to conventional emphysema score, our normalized emphysema score is a good predictor of all-cause and lung cancer mortality in settings where multiple CT scanners and protocols are used.
Citation: Gallardo-Estrella L, Pompe E, de Jong PA, Jacobs C, van Rikxoort EM, Prokop M, et al. (2017) Normalized emphysema scores on low dose CT: Validation as an imaging biomarker for mortality. PLoS ONE 12(12): e0188902. https://doi.org/10.1371/journal.pone.0188902
Editor: Christophe Leroyer, Universite de Bretagne Occidentale, FRANCE
Received: June 4, 2017; Accepted: November 14, 2017; Published: December 11, 2017
Copyright: © 2017 Gallardo-Estrella et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: None of the authors in the paper are affiliated with the U.S. National Cancer Institute (NCI) and none of the authors own the dataset. Data was obtained from the National Lung Screening Trial (NLST) study, described in the project "Automatic detection of pulmonary, cardiovascular and skeletal findings and their relation to all-cause mortality in NLST data" (Project ID: NLST-111) (https://biometry.nci.nih.gov/cdas/approved-projects/791/). All de-identified NLST data (CT images, mortality status, mortality cause, gender, age, pack-years, smoker status, follow-up time, and reconstruction settings) are available from the NCI Cancer Data Access System (CDAS) at (https://biometry.nci.nih.gov/cdas) to all researchers by submitting an application at CDAS. In this study, we computed origES and normES from the CT images provided by the NCI. These variables are provided in a csv file included as Supporting Information.
Funding: The author(s) received no specific funding for this work.
Competing interests: Colin Jacobs receives a research grant from MeVis Medical Solutions AG; Eva M. van Rikxoort is a shareholder and co-founder at Thirona BV; Mathias Prokop receives a grant from Toshiba and is on the speaker’s bureau for Bracco, Bayer and Toshiba; Bram van Ginneken is a shareholder and co-founder at Thirona BV. This does not alter our adherence to PLOS ONE policies on sharing data and materials.
Introduction
The presence of emphysema, visually assessed on CT images, has been shown to be a risk factor for lung cancer and overall mortality [1–4]. A known drawback of visual assessment is its subjectivity, which makes it susceptible to observer variability [5]. To overcome this limitation, computerized methods to objectively quantify emphysema have been developed. The most widely used measurement is the emphysema score (ES), defined as the percentage of lung voxels below a certain Hounsfield Unit (HU). ES has been previously associated with mortality in single center studies with a single imaging protocol [6, 7]. However, Martinez et al. [8] did not find any associations between ES and mortality risk in a study using data from multiple centers. Furthermore, Gierada et al. [9] showed only a weak association between ES and lung cancer in subjects derived from the multi-center National Lung Screening Trial (NLST).
A possible explanation for this discrepancy could be the heterogeneity of the reconstruction parameters used in multi-center studies, while single center studies all used the same imaging protocol. ES is known to vary with, among other factors, slice thickness and reconstruction kernel [10]. Hence, the variability introduced by the use of different reconstruction settings may obscure the association between ES and mortality.
Several methods have been suggested to overcome this problem. Blechschmidt et al. [11] proposed a morphology based method that classified bullae according to their size. The algorithm improved the results of emphysema quantification by ignoring isolated low attenuation voxels that were considered noise. Another way to reduce variability of ES across reconstruction kernels is to use a recently introduced normalization algorithm [12]. Results showed that normalized ES was independent of reconstruction settings and its correlation with lung function parameters was improved.
We present a normalized ES (normES) by applying resampling, normalization and bullae analysis prior to emphysema quantification. We hypothesize that normES is a better univariate predictor of mortality than the conventional emphysema score. Therefore, the purpose of this study is to develop a CT biomarker of emphysema that is robust across reconstruction settings, and evaluate its ability to predict mortality in patients at high risk for lung cancer, compared to conventional emphysema measurements.
Materials and methods
Ethics statement
Data was obtained from the National Lung Screening Trial (NLST) study, which obtained approval by an institutional review board at each screening center and all participants provided written informed consent in the study. Data received from the NLST study were de-identifyied/anonymized prior to access and analysis, therefore no identifiable information was used. For this reason, no institutional review board (IRB) approval was needed. Nonetheless, a waiver of approval was given by the Radboud University Medical Center IRB.
Participants
This retrospective study used data from the CT arm of the NLST. The NLST is registered with clinical trial registration number NCT00047385 (https://clinicaltrials.gov/ct2/show/NCT00047385). Enrollment criteria and study design have been described previously [13]. The NLST study was a multicenter, randomized controlled trial in which participants enrolled at 33 centers in the United States underwent three annual screenings from August 2002 to April 2004 using either chest radiography or low-dose CT. The primary goal of the study was to compare the mortality rates in the low-dose CT arm with the mortality rates in the chest radiography arm.
Since the NLST study only allows to request the CT image data from a maximum of 6000 participants, we included all 1810 subjects that died during the trial (cases) and 4190 surviving controls (either censored or still alive at the end of the trial) randomly selected from the surviving group. With this study design, the odds ratio obtained will be approximately equal to the odds ratio in the full cohort, and it will also approximate the risk ratio in the complete cohort [14].
Only baseline scans (T0) were included in the present study, together with information about all-cause and lung cancer mortality outcomes obtained. Data used in this study is described in the project NLST-111 from the NCI Cancer Data Access System.
Of 6000 subjects, 260 patients were excluded for different reasons: they had no baseline CT scan (97), DICOM data was corrupted (161) or the CT images were not complete (2). This yielded a total of 5740 subjects (4003 alive, 1737 deceased) selected. Details of the flow of participants through the study are shown in Fig 1.
The study population was composed of 2174 (37.9%) women and 3566 (62.1%) men, with a median age of 61 years old (inter-quartile range, 57–65 years old), and 61 years old (inter-quartile range, 58–66 years old), respectively. The reconstruction settings for the baseline CT images are shown in S1 Table.
Quantification of emphysema
Emphysema quantification was performed using CIRRUS Lung Quantification (Diagnostic Image Analysis Group, Nijmegen, The Netherlands; Fraunhofer MEVIS, Bremen, Germany). First, the lungs were automatically extracted using a segmentation algorithm based on region growing and morphological smoothing [15]. The extent of emphysema was then calculated using emphysema scores (ES), defined as the percentage of lung voxels with intensity values below −950 HU. Emphysema scores were computed in the original CT scans (origES) and in the images obtained after applying resampling to 3mm slice thickness, normalization and bullae analysis (normES). The goal of this processing is to reduce the variability in emphysema quantification produced by differences in slice thickness, reconstruction kernel and noise. The normalization reduces variability in ES as a result of varying reconstruction kernels by altering the appearance of CT scans to acquire similar characteristics as a reference kernel [12]. The bullae analysis algorithm detects air clusters inside the lungs and ignores those with a size lower than 5mm2 as they are assumed to be noise [11]. The parameters used in the aforementioned algorithms were calculated in the datasets described in [12] and [11], respectively. These datasets were completely independent from the NLST data. A detailed description of the algorithm to obtain normES can be found in S1 Appendix.
Statistical analysis
To divide subjects in different categories based on their emphysema scores, we followed the same procedure as in the study of Johannessen et al. [7]. The degree of emphysema was divided in three categories as follows: a) low, for ES below the 60th percentile; b) medium, for ES between the 60th and 80th percentile; c) high, for ES higher than the 80th percentile.
Kaplan-Meier analyses were used to compute survival curves according to severity of emphysema. Pairwise log rank comparisons were conducted to determine which emphysema groups had different survival distributions. A Bonferroni correction was applied with statistical significance accepted at the p <0.0167 level. Since the death rate in our cohort is higher than in the full NLST study cohort, the alive sub-cohort was uniformly resampled to simulate the full alive cohort.
Time dependent receiver operating characteristic (ROC) curves and area under the ROC curve (AUC) were computed using non-parametric estimates for survival data [16–18] to evaluate the ability of origES and normES to predict all-cause and lung cancer mortality in a follow-up time of 1–7 years. Differences in AUCs were evaluated using bootstrap methods [19] with 6000 bootstrap samples.
To assess the added value of normES versus origES, we computed the continuous net classification improvement (NRI) for censored data [20, 21]. NRI quantifies the extent to which a biomarker assigns higher probabilities to individuals with outcome and lower probabilities to individuals without outcome compared to an initial biomarker.
Statistical analyses were performed using SPSS (version 23.0, IBM Corp., Armonk, NY) and R statistical package (version 3.2.1, R Foundation, Vienna, Austria) with packages “survMarkerTwoPhase” (version 1.1), “survival” (version 2.38.1) and “survIDINRI” (version 1.1).
Results
Baseline demographic characteristics of study participants are shown in Table 1. The 60th and 80th percentiles for origES were 5.4% and 12.0%, respectively; and 0.5% and 1.9%, respectively for normES.
Kaplan-Meier plots show survival estimates for the low, medium and high emphysema groups for all-cause and lung cancer mortality (Figs 2 and 3). We found that the differences in risk of all-cause and lung cancer mortality among emphysema categories became much more pronounced using normES instead of origES. Survival distributions were significantly different across all categories for all-cause mortality (p ≤ 0.003) and lung cancer mortality (p <0.001) when emphysema was quantified by normES. For origES and all-cause mortality, survival distributions were not statistically different between the low and medium emphysema categories (p = 0.272). For origES and lung cancer mortality, survival distributions were statistically different only between the low and high emphysema categories (p = 0.001).
Blue, low emphysema category; green, medium emphysema category; orange, high emphysema category. Tick marks on the curves indicate censored data. Vertical dashed lines indicate time points at 730 days (2 years), 1460 days (4 years) and 2190 days (6 years). Number on top of the dashed vertical lines indicate the number of patients being followed up until the corresponding time point. Patients that are no longer followed up may be censored or deceased.
Blue, low emphysema category; green, medium emphysema category; orange, high emphysema category. Tick marks on the curves indicate censored data. Vertical dashed lines indicate time points at 730 days (2 years), 1460 days (4 years) and 2190 days (6 years). Number on top of the dashed vertical lines indicate the number of patients being followed up until the corresponding time point. Patients that are no longer followed up may be censored or deceased.
NRI and ROC analysis results are shown in Table 2. This shows that normES is a superior predictor of all-cause mortality as compared to origES. The 95% confidence interval for the difference in AUCs indicates a significant difference between origES and normES for a follow-up time of 2–6 years. Furthermore, NRI values indicate a statistically significant improvement in classification for 2–7 years of follow-up when using normES.
A similar trend was observed for lung cancer mortality, as shown in Table 3. In this case, there was a significant difference between AUCS for origES and normES for follow-up times of 1–6 years and a significant improvement in classification for follow-up times of 3–7 years as indicated by the NRI values.
Figs 4–6 illustrate the effect of the normalization algorithm in subjects with different levels of emphysema.
The subject died after 1101 days. The CT image was acquired using a GE LightSpeed Pro 16 scanner and reconstructed with STANDARD kernel and 5mm slice thickness. (A) Shows the original CT section, (B) shows the original CT section with an emphysema overlay (origES), (C) shows the normalized CT section, and (D) shows the normalized CT section with a normalized emphysema overlay (normES).
The subject was followed up for 2595 days. The CT image was acquired using a Siemens Sensation 16 scanner and reconstructed with B70f kernel and 2mm slice thickness. (A) Shows the original CT section, (B) shows the original CT section with an emphysema overlay (origES), (C) shows the normalized CT section, and (D) shows the normalized CT section with a normalized emphysema overlay (normES).
The subject died after 2087 days. The CT image was acquired using a Toshiba Aquilion scanner and reconstructed with FC51 kernel and 2mm slice thickness. (A) Shows the original CT section, (B) shows the original CT section with an emphysema overlay (origES), (C) shows the normalized CT section, and (D) shows the normalized CT section with a normalized emphysema overlay (normES).
Discussion
This study shows that normES has a higher prognostic value than origES for all-cause and lung-cancer mortality in a large lung cancer screening cohort using CT data from multiple centers. Therefore, normES may be used as a robust biomarker for emphysema that could identify patients at increased risk of death and may benefit from early treatment or more frequent screening. To our knowledge, this is the first study that analyzes the relation between computerized emphysema quantification and mortality outcome in a large and heterogeneous database.
Several studies have analyzed the relationship between emphysema and mortality outcome before. Computerized emphysema scores have been found to be predictive of all-cause mortality in α1-antitrypsin deficiency patients [22] and in patients with various stages of COPD [6]. Johanessen et al. [7] showed that emphysema severity was associated with an increased all-cause mortality in a cohort of patients with and without COPD. Additionally, Zulueta et al. [4] also reported visual assessment of emphysema to be a significant predictor of lung cancer mortality in a lung cancer screening cohort.
Contrary to these findings, Martinez et al. [8] showed that computerized emphysema scores were not associated with mortality in a cohort of patients with severe emphysema, and Gierada et al. [9] showed only a weak association between emphysema quantification and lung cancer risk, using a smaller set of data from the NLST. We hypothesized that this discrepancy was due to differences in CT data induced by the acquisition protocol used: while the previously cited studies [4, 6, 7, 22] were single protocol, in the studies of Martinez et al [8] and Gierada et al. [9], the data was obtained from multiple centers with different imaging protocols. We also hypothesized that our normalization procedure based in a previously proposed algorithm to reduce variability in ES [12] should be able to correct for the confounding effect of CT acquisition protocols. This is illustrated in Figs 4–6, which show examples where normalized emphysema scores are in agreement with what can be visually assessed as emphysema, contrary to the standard emphysema scores that are too low when a soft kernel with thick sections is employed (Fig 4), and too high when a very sharp reconstruction kernel is used (Fig 5).
Our results are in accordance with both our hypotheses. Kaplan-Meier curves for all-cause mortality showed that there was an evident increase in risk of death already after 2 years of the baseline CT scan for patients with severe emphysema as measured by normES. However, for origES, there was no noticeable increase of risk until 4 years. This trend was even more obvious for lung cancer mortality, where the increase in risk of death was noticeable after 3 years for normES, whereas for origES it was only visible after 6 years. Furthermore, this improvement in risk reclassification for all-cause mortality was also observed for normES compared to origES for a follow-up time of 2 years and above.
Results on the association between emphysema and mortality are scarce, probably due to the fact that visual scoring of emphysema is time consuming and prone to inter-observer variability. Computerized emphysema scoring eliminates this variability but is still highly sensitive to CT reconstruction settings. This complicates the possibility of comparing data from different sources and thus impedes use in clinical routine, where variations in CT acquisition protocols and reconstruction kernels are inevitable. The presented normalization method overcomes these limitations and provides a robust emphysema biomarker that is easily computed automatically, can facilitate risk assessment and could be included in follow-up management strategies. Furthermore, normalized emphysema scores can be of value in lung cancer prediction models that not only consider nodule characteristics into account, but also include the presence of emphysema as a parameter [23].
Our study has some limitations. First, we analyzed the predictive value of emphysema quantification without taking into account other possible covariates. However, the goal of this study was not to create a complete prediction model for mortality, but to compare normalized emphysema scores to standard emphysema scores. Our results suggest that normES is a robust marker that may have an important prognostic value, especially in multi-center studies. We believe that it can improve the predictive ability of existing risk prediction models that includes the amount of emphysema as a variable. Second, we note that the AUC values obtained are not high, but this is to be expected as we are only using one marker, computed at baseline, to predict mortality over many years of follow-up. While further validation of normES in other lung cancer screening cohorts and as part of more elaborated lung cancer prediction models is needed, we believe that the results of this study can have relevant clinical implications in management and follow-up of patients in lung cancer screening programs.
Conclusion
We have presented a robust CT imaging biomarker for emphysema that is associated with all-cause and lung cancer mortality in a high risk population regardless of the imaging protocol used. This biomarker can be used in risk prediction models, could improve follow-up management and might increase the cost-effectiveness of lung cancer screening programs.
Supporting information
S1 Table. Reconstruction parameters of the selected dataset.
https://doi.org/10.1371/journal.pone.0188902.s001
(PDF)
S1 Appendix. Supplementary methods.
Detailed description of the resampling, normalization and bullae analysis algorithms used in this work.
https://doi.org/10.1371/journal.pone.0188902.s002
(PDF)
S1 Fig. Illustration of the resampling method.
https://doi.org/10.1371/journal.pone.0188902.s003
(PDF)
S2 Fig. Illustration of the separation of the original image into energy bands.
https://doi.org/10.1371/journal.pone.0188902.s004
(PDF)
S1 File. origES and normES values for every subject in the dataset.
https://doi.org/10.1371/journal.pone.0188902.s005
(CSV)
Acknowledgments
The authors thank the National Cancer Institute for access to NCI data collected by the National Lung Screening Trial. The statements contained herein are solely those of the authors and do not represent or imply concurrence or endorsement by NCI.
References
- 1. de Torres JP, Bastarrika G, Wisnivesky JP, Alcaide AB, Campo A, Seijo LM, et al. Assessing the relationship between lung cancer risk and emphysema detected on low-dose CT of the chest. Chest. 2007;132:1932–1938. pmid:18079226
- 2. Henschke CI, Yip R, Boffetta P, Markowitz S, Miller A, Hanaoka T, et al. CT screening for lung cancer: Importance of emphysema for never smokers and smokers. Lung Cancer. 2015;88:42–47. pmid:25698134
- 3. Wilson DO, Weissfeld JL, Balkan A, Schragin JG, Fuhrman CR, Fisher SN, et al. Association of radiographic emphysema and airflow obstruction with lung cancer. Am J Respir Crit Care Med. 2008;178:738–744. pmid:18565949
- 4. Zulueta JJ, Wisnivesky JP, Henschke CI, Yip R, Farooqi AO, McCauley DI, et al. Emphysema scores predict death from COPD and lung cancer. Chest. 2012;141:1216–1223. pmid:22016483
- 5. Bankier AA, De Maertelaer V, Keyzer C, Gevenois PA. Pulmonary emphysema: subjective visual grading versus objective quantification with macroscopic morphometry and thin-section CT densitometry. Radiology. 1999;211:851–858. pmid:10352615
- 6. Haruna A, Muro S, Nakano Y, Ohara T, Hoshino Y, Ogawa E, et al. CT scan findings of emphysema predict mortality in COPD. Chest. 2010;138:635–640. pmid:20382712
- 7. Johannessen A, Skorge TD, Bottai M, Grydeland TB, Nilsen RM, Coxson H, et al. Mortality by level of emphysema and airway wall thickness. Am J Respir Crit Care Med. 2013;187:602–608. pmid:23328525
- 8. Martinez FJ, Foster G, Curtis JL, Criner G, Weinmann G, Fishman A, et al. Predictors of mortality in patients with emphysema and severe airflow obstruction. Am J Respir Crit Care Med. 2006;173:1326–1334. pmid:16543549
- 9. Gierada DS, Guniganti P, Newman BJ, Dransfield MT, Kvale PA, Lynch DA, et al. Quantitative CT assessment of emphysema and airways in relation to lung cancer risk. Radiology. 2011;261:950–959. pmid:21900623
- 10. Mets OM, de Jong PA, van Ginneken B, Gietema HA, Lammers JWJ. Quantitative Computed Tomography in COPD: Possibilities and Limitations. Lung. 2012;190:133–145. pmid:22179694
- 11. Blechschmidt RA, Werthschützky R, Lörcher U. Automated CT Image Evaluation of the Lung: A Morphology-Based Concept. IEEE Trans Med Imaging. 2001;20:434–442. pmid:11403202
- 12. Gallardo-Estrella L, Lynch DA, Prokop M, Stinson D, Zach J, Judy PF, et al. Normalizing computed tomography data reconstructed with different filter kernels: effect on emphysema quantification. Eur Radiol. 2016;26:478–486. pmid:26002132
- 13. Aberle DR, Berg CD, Black WC, Church TR, Fagerstrom RM, Galen B, et al. The National Lung Screening Trial: overview and study design. Radiology. 2011;258:243–253. pmid:21045183
- 14. Pearce N. Classification of epidemiological study designs. Int J Epidemiol 2012;41:393–97 pmid:22493323
- 15. van Rikxoort EM, de Hoop B, Viergever MA, Prokop M, van Ginneken B. Automatic lung segmentation from thoracic computed tomography scans using a hybrid approach with error detection. Med Phys. 2009;36:2934–2947. pmid:19673192
- 16. Liu D, Cai T, Zheng Y. Evaluating the predictive value of biomarkers with stratified case-cohort design. Biometrics. 2012;68:1219–1227. pmid:23173848
- 17. Pepe MS, Zheng Y, Jin Y, Huang Y, Parikh CR, Levy WC. Evaluating the ROC performance of markers for future events. Lifetime Data Anal. 2008;14:86–113. pmid:18064569
- 18. Zheng Y, Cai T, Pepe MS, Levy WC. Time-dependent Predictive Values of Prognostic Biomarkers with Failure Time Outcome. J Am Stat Assoc. 2008;103:362–368. pmid:19655041
- 19.
Efron B, Tibshirani RJ. An introduction to the bootstrap. vol. 57. CRC press; 1994.
- 20. Pencina MJ, D’Agostino RBS, Steyerberg EW. Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Stat Med. 2011;30:11–21. pmid:21204120
- 21. Uno H, Tian L, Cai T, Kohane IS, Wei LJ. A unified inference procedure for a class of measures to assess improvement in risk prediction systems with survival data. Stat Med. 2013;32:2430–2442. pmid:23037800
- 22. Dawkins PA, Dowson LJ, Guest PJ, Stockley RA. Predictors of mortality in alpha1-antitrypsin deficiency. Thorax. 2003;58:1020–1026. pmid:14645964
- 23. McWilliams A, Tammemagi MC, Mayo JR, Roberts H, Liu G, Soghrati K, et al. Probability of cancer in pulmonary nodules detected on first screening CT. N Engl J Med. 2013;369:910–919. pmid:24004118