Novel high-resolution computed tomography-based radiomic classifier for screen-identified pulmonary nodules in the National Lung Screening Trial

Purpose Optimization of the clinical management of screen-detected lung nodules is needed to avoid unnecessary diagnostic interventions. Herein we demonstrate the potential value of a novel radiomics-based approach for the classification of screen-detected indeterminate nodules. Material and methods Independent quantitative variables assessing various radiologic nodule features such as sphericity, flatness, elongation, spiculation, lobulation and curvature were developed from the NLST dataset using 726 indeterminate nodules (all ≥ 7 mm, benign, n = 318 and malignant, n = 408). Multivariate analysis was performed using least absolute shrinkage and selection operator (LASSO) method for variable selection and regularization in order to enhance the prediction accuracy and interpretability of the multivariate model. The bootstrapping method was then applied for the internal validation and the optimism-corrected AUC was reported for the final model. Results Eight of the originally considered 57 quantitative radiologic features were selected by LASSO multivariate modeling. These 8 features include variables capturing Location: vertical location (Offset carina centroid z), Size: volume estimate (Minimum enclosing brick), Shape: flatness, Density: texture analysis (Score Indicative of Lesion/Lung Aggression/Abnormality (SILA) texture), and surface characteristics: surface complexity (Maximum shape index and Average shape index), and estimates of surface curvature (Average positive mean curvature and Minimum mean curvature), all with P<0.01. The optimism-corrected AUC for these 8 features is 0.939. Conclusions Our novel radiomic LDCT-based approach for indeterminate screen-detected nodule characterization appears extremely promising however independent external validation is needed.


Introduction
With approximately 160,000 deaths annually in the US, lung cancer continues to account for more cancer-related deaths than colon, prostate and breast cancer combined. [1] In 2011, the National Lung Screening Trial (NLST) demonstrated a 20% relative reduction in lung cancer mortality with annual low-dose computed tomography (LDCT). [2] These encouraging results triggered the widespread endorsement of lung cancer screening. However large-scale implementation has been hampered by the high rate of false-positive LDCT studies. [3] In the NLST approximately 40% of individuals randomized to LDCT screening had one or more pulmonary nodules identified during the study period, 96% of which were ultimately proven benign. [2] In addition to lung cancer screening the increasing utilization of diagnostic chest computed tomography (CT) results in an estimated 1.5 million incidentally discovered indeterminate lung nodules in the US annually. With the implementation of LDCT lung cancer screening for the > 10 million US adults meeting the screening eligibility criteria, this number is estimated to increase substantially. [4] In summary there appears to be a potential emerging global epidemic of newly detected lung nodules. [5] This increased detection of indeterminate pulmonary nodules in the absence of reliable non-invasive strategies to differentiate benign and malignant nodules will almost certainly result in an increase in iatrogenic mortality, treatment related morbidity and health care costs. While unnecessary invasive diagnostic and therapeutic interventions were kept to a minimum in the NLST study, the management of indeterminate pulmonary nodules in clinical practice serving the general population remains a major challenge. [2] Clinical risk calculators have significantly improved the management of indeterminate pulmonary nodules, but additional tools to distinguish benign from malignant nodules are needed, especially for intermediate risk pulmonary nodules, in order to minimize patient anxiety, radiation exposure, health care costs, and procedural morbidity and mortality. [6][7][8][9][10][11] We have previously demonstrated that quantitative volumetric CT-based nodule characterization effectively risk-stratifies lung nodules of the adenocarcinoma spectrum. [12][13][14][15][16] In addition we have recently reported in a Lung Tissue Research Consortium based case control study that radiological features of the nodule surrounding lung tissue are potentially valuable in distinguishing benign from malignant lung nodules. (manuscript submitted) This approach eliminates the intra-and inter-observer variability and is independent of the training level of the interpreting radiologist. In addition, modern digital CT images include a large amount of valuable high-dimensional data that currently is not fully utilized besides contributing to the overall impression "gestalt" by the radiologist. This invaluable resource can be leveraged by modern quantitative imaging methods. Radiomic approaches to lung nodule analysis consist of extracting reproducible and objective quantitative radiological variables from CT datasets, reducing large volumes of complex data to manageable and clinically relevant information. [17] These quantitative imaging techniques have been proposed to facilitate the development of diagnostic and prognostic models in lung imaging, allowing for example the risk-stratification of lung adenocarcinomas, the classification of screen-or incidentally detected lung nodules and the characterization of lung cancer subtypes and tumor heterogeneity. [14,[18][19][20][21][22][23] In this study, we used the NLST dataset to develop and internally validate a radiological multivariate model to distinguish malignant from benign CT-screen detected indeterminate pulmonary nodules.

Subject selection
The Mayo Clinic and Vanderbilt University Institutional Review Boards approved or exempted this study (IRB numbers: Vanderbilt University 151500 and Mayo Clinic 15-002674). All participants for the present study were selected from the pool of eligible participants in the NLST, and all patient data were fully anonymized. The methods of the NLST have been published elsewhere. [2,24] Briefly, the NLST was a randomized controlled trial conducted at 33 US centers, approved by the Institutional review boards at all centers. The study recruited asymptomatic high-risk individuals from August 2002 through April 2004, aged 55 to 74 years, with a smoking history of at least 30 pack-years, who quit 15 years or less prior to randomization. Individuals were screened with either annual low-dose CT or chest X-ray for three years and followed through December 31, 2009. 26,722 individuals were randomized to the lowdose CT arm, and over 10,000 nodules (4-30 mm in longest diameter) were detected during the screening rounds.
Participants for the present study were selected from the pool of eligible participants in the NLST, who did not withdraw from follow-up, in the CT arm of the study (N = 26,262) and included all screen-detected lung cancer cases: adenocarcinomas, squamous cell carcinomas, large cell carcinomas, small cell carcinomas and carcinoid tumors. Non-lung cancer controls were selected as a stratified random sample from all participants without a diagnosis of lung cancer during the screen or follow-up periods of the NLST. Cases with more than one nodule were excluded. We restricted our analysis to pulmonary nodules with a size defined by a largest diameter between 7 and 30 mm as reported in the NLST database.

Screening HRCT data
All NLST screening scans were low-dose scans with 2.5 mm collimation or less as pre-defined by strict NLST criteria, the details of which have been published elsewhere. [24] The CT datasets were obtained from the Lung Screening Study core laboratory and transferred to a hard drive that was shipped to the investigators. The datasets from the American College of Radiology Imaging Network core laboratory were transferred initially via hard drive, then electronically to the investigators. Information on nodule location was available to the investigators in the NLST database and confirmed by one radiologist (B.J.B.) and two pulmonologists (F.M. and T.P.) using the CT obtained closest in time to the diagnosis of malignant or benign lung nodules. Nodules were electronically tagged for segmentation and analysis. HRCT without visible nodules, nodules with borders indistinguishable from neighboring structures (e.g. mediastinum or pleura) and nodules without related clinical data were excluded.

Optimization and validation of nodule segmentation
The lung nodules were segmented manually using the ANALYZE software (Biomedical Imaging Resource, Mayo Clinic, Rochester, MN). The location and the extent of each nodule was identified visually and a stack of two dimensional borders were traced out along the transverse orientation. A semi-automated region-growing approach based on the operator-specified bounding cube enclosing the nodule and a seed location within the nodule was used for initial segmentation (see supporting information). Manual editing was performed to remove, if needed, intruding structures like vessels and pleura. A parametric feature-based region growing technique based on the texture classification of the voxels within the operator specified bounding cube was used as previously described. [25]

Radiomic features
A comprehensive set of automatically computable, quantitative radiomic metrics was included for the development of a multivariable predictive model to discriminate benign from malignant lung nodules. Based on previous data and preliminary analysis (S1 File), we considered metrics within the following categories: general characteristics of the nodule (size and location), nodule characteristics (radiodensity, texture and surface characteristics) and features of the nodule-free surrounding lung characteristics, as below ( Table 1)

Development of Score Indicative of Lesion/Lung Aggression/Abnormality (SILA)
Current literature suggests that no single quantitative metric exists to differentiate benign and malignant nodules. However, multivariate predictive models based on an ensemble of nodule texture/density, surround texture/density, nodule surface and other shape descriptors could improve the discriminability. To facilitate the multivariate analysis we investigated the possibility of replacing our previously developed nodule texture/density and surface categorization using unsupervised stratification into continuous variables that can be thresholded at multiple levels to provide, if needed, the necessary categorization. We developed SILA to map the nine nominal texture/surface exemplar distributions of the nodule onto a continuous scale. The nine nominal exemplar distributions can be ordinated in 362,880 (factorial 9) ways. To identify the unique ordination that correlates with the virulence/malignancy of the nodule, we used qualitative spatial reasoning and multi-dimensional scaling. Based on this, the nine texture exemplars arbitrarily labeled as V,I,B,G,Y,O,R,C, and P were ordinated as V-R-O-I-Y-P-B-G-C identical to that used to represent the distributions via the glyphs ( Figure C in S1 File). The nine surface exemplars were ordinated as unknown-minimal surface-valley-flat-ridge-pitsaddle valley-saddle ridge-peak. SILA was computed as the Cramer-Von Mises Distance of the ordinated exemplar distributions. Using a similar strategy, the seven primal parenchymal

Multivariate model
Quantitative methods were developed to characterize independent radiological variables assessing various radiologic nodule features. Univariate analysis of the discriminatory power of each radiologic variable and receiver operative curve (ROC) analysis were performed for each variable and an area under the curve (AUC) calculated. Statistical significance was calculated and adjusted for multiple comparisons using Bonferroni correction. Spearman rank correlations between all pairs of variables were calculated and displayed via a heat map. Multivariate analysis was performed using least absolute shrinkage and selection operator (LASSO) method for both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the multivariate statistical model. To increase the stability of the modeling, LASSO was run 1,000 times and the variables that were selected by at least 50% of the runs were included into the final multivariate model. [26] The bootstrapping method was then applied for the internal validation, and the optimism-corrected AUC was reported for the final model.

Study participants
We reviewed 649 LDCT of cancers diagnosed in the screening arm of the NLST that included 353 adenocarcinomas, 136 squamous cell carcinomas, 28 large cell carcinomas, 75 non-small cell carcinomas, 49 small cell carcinomas and 5 carcinoid tumors. After exclusion of cases lacking HRCT data, cases with no apparent lesion on last HRCT prior to the cancer diagnosis, cases with nodules invading the mediastinum, cases with missing outcome data, and lesion with size < 7mm or >30 mm, 408 LDCT scans with malignant nodules were selected and analyzed. A stratified random sample of non-lung cancer control nodules (size between 7 and 30 mm) was selected on a 1:1 basis, and 318 benign nodules were selected and included in the analysis (Fig 1). The demographic and clinical characteristics of individuals included in the study are summarized in Table 1.
In order to prevent overfitting of the model, we only considered quantitative imaging variables that were known a priori to be potentially associated with the benign or malignant nature of lung nodules (see supplemental material). Quantitative methods were developed to characterize independent radiological variables assessing various radiologic nodule features including 1. Nodule location, 2. Nodule size, 3. Nodule shape, 4. Nodule radiodensity 5. Nodule texture, 6. Texture/radiodensity of the nodule-free surrounding lung, 7. Nodule surface characteristics and 8. Distribution of the nodule surface characteristics exemplars using 726 nodules identified from the NLST dataset (benign, n = 318 and malignant, n = 408). (Table 2)

Multivariate analysis
In order to select the optimal variables, adjust the regression coefficients to optimize the transportability (external validity) of the model and determine the degree of optimism of the model and perform optimism-corrected analysis of the performance of the model by ROC analysis, all 57 quantitative imaging variables were included in the LASSO regression model. Multivariate analysis using LASSO on all features yielded a multivariate model with 8 selected features (selected with frequency > 50% after introducing bootstrap to reduce variability after 1000 runs) with an AUC estimate of 0.941. (Fig 2) These 8 features include: 1. Offset carina cen-troid_z (Nodule location), 2. Minimum enclosing brick (Nodule shape), 3. Nodule flatness (Nodule shape), 4. SILA nodule texture (Nodule texture), 5. Maximum shape index (Nodule surface Characteristics), 6. Average shape index (Nodule surface Characteristics), 7. Average positive mean curvature (Nodule surface Characteristics) and 8. Minimum mean curvature (Nodule surface Characteristics), all with P<0.01. To correct overfitting (internal validation) we used the bootstrapping technique to estimate the optimism of the AUC. The optimism-corrected AUC is 0.939 (Fig 2). Using Youdan's index, we obtained the optimal cutoff at 0.478 with sensitivity 0.904 and specificity 0.855. A subset analysis of nodules with size between 7 Offset carina centroid_z captures the location of the nodule in the vertical axis in relationship to the carina, the minimal enclosing brick and flatness capture shape and volume, SILA texture is a summary variable capturing the nodule texture, maximum and average shape index capturing the complexity of the nodule surface and average positive mean curvature and minimum mean curvature representing the degree of curvature of the outer surface of the nodule account for the surface characteristics of the nodule.

Discussion
In this study, we report the development and the performance of an internally-validated multivariate radiomic model to differentiate malignant and benign screen-identified indeterminate lung nodules. Using a large lung cancer screening dataset of images obtained with a broad spectrum of CT scanners, acquisition protocols and reconstruction kernels, we demonstrate that our automated radiomic approach reliably distinguishes benign from malignant nodules. This approach, if externally validated, could inform management of screen-identified pulmonary nodules and potentially minimize morbidity, mortality, health care costs, radiation exposure and patient anxiety associated with the currently accepted approach for the evaluation and management of indeterminate pulmonary nodules.
To eliminate "agnostic" variables with unknown or improbable clinical significance we preselected quantitative imaging features with known potentially associations to the benign or malignant nature of lung nodules for our model. In addition to standard nodule descriptors such as size and location we include variables capturing nodule surface characteristics, density and characteristics of the nodule-free surrounding lung. Although a number of these additional features may influence the subjective assessment by trained radiologist, they currently cannot be accurately measured clinically. [27] While predictive in the univariate analysis features of the immediate nodule-free surrounding lung, as determined by quantitative estimates of low-attenuation (emphysema), groundglass and reticular changes within 10 mm of the segmented boundaries of the nodule were not found to be useful predictors after LASSO selection of candidate predictors. Interestingly, nodule size was not one of the eight selected variables. The only potential variable related to size was the minimum enclosing brick. In order to evaluate the performance of our model without the nodule size as a variable, the optimism-corrected AUC was calculated after removing each variable ( Table 3). The AUC for the 7-variable model without minimum enclosing brick was 0.929, suggesting that nodule size did not exert a disproportionate influence on the final model.
If externally validated the excellent diagnostic test performance of our multivariate model could significantly advance the management of patients with screen-detected indeterminate pulmonary nodules. The development of this model based on a large and technically heterogeneous screening dataset including a geographically diverse population and various CT scanners and acquisition protocols, strengthen the external validity of our study. [2,24] In addition, all analyzable nodules from the NLST were included in modeling which used model selection through shrinkage (LASSO) and bootstrap analysis, allowing adjustment for overfitting and validation of the modeling process. [26] One the main limitations to broad implementation of lung cancer screening remains the large number of false positive screening CT. In order to mitigate this problem and decrease unnecessary patient complications, radiation exposure and patient anxiety, the nodule size threshold for a positive screening study was raised to 6 mm. [28][29][30] This size threshold has accordingly been endorsed by several other societies such as the Fleischner Society. [31] We selected a threshold of 7 mm in our study for its similarity with this threshold, and also for consistency with the DECAMP-1 study we are planning to use for external validation (NCT01785342). While this 6mm threshold is unquestionably an improvement over the NLST criteria, the number of false positive CT remains substantial, and this problem is likely to persist as screening is more broadly implemented and eligible individuals are screened over longer time periods. Another potentially fruitful avenue of research is the applications of longitudinal volumetric assessment of screen-identified lung nodules, which have been associated with a substantial reduction in the incidence of false-positive CT as well. [32,33] In fact, the recent European position statement on lung cancer screening endorses volumetric analysis for lung nodule assessment. [34] While some blood or bronchoscopy-based biomarkers have been proposed to facilitate nodule classification, they require additional invasive procedures, which may be difficult to generalize at the population level. [25,[35][36][37][38][39] Leveraging existing and currently unexploited data to refine the sensitivity and specificity of LDCT would therefore be Radiomics classifier for screen-detected nodules desirable. Our radiomics classifier compares favorably to currently existing clinical, blood or tissue or radiology-based prediction models and focuses specifically on lung nodule variables considered clinically relevant. Rather than replacing current clinical-based assessment of lung nodules based on size or volumetric analysis, we believe that our classifier could represent an adjunct diagnostic tool to inform clinical decisions for intermediate risk indeterminate pulmonary nodules. This radiomics approach would also not require additional expensive imaging such as PET-CT as required by other additive models. [7,8,10] There are several limitations to our work. First, our model has not yet been externally validated before it is used clinically. The prevalence of malignancies in our cohort is > 50%, which is distinctly more than in a typical screening cohort including similar size lesions (12%). Consequently, it is unclear how our model will perform in independent screening cohorts with a more typical nodule prevalence. If our model cannot be validated it may have to be adjusted based on the validation cohort. However, we used an optimal internal validation model (LASSO), which not only surpasses conventional internal validation approaches (split sample and cross validation), but also penalizes the model to avoid overfitting and optimizes the generalizability of the model. Second, the model was developed from a very heterogeneous sample of the NLST CT dataset and we found the selected radiomic features to be robust and stable across CT platforms, acquisition protocols and reconstructions kernels, which we believe strengthens the reproducibility of our model.
Third, the semi-automatic segmentation technique used in this study with manual adjustment by the investigators could admittedly introduce operator-driven variability in radiomic analysis. However, we have recently analyzed the reproducibility of radiomic analysis of adenocarcinomas using the same segmentation technique and found excellent Intraclass Correlation Coefficient (0.828 (95% CI 0.76, 0.895) for the Vanderbilt cohort of 50 adenocarcinomas. [40] We believe that these results support the external validity of our work.
Fourth, the relatively small number of cases did not allow us to exclude the influence of clinical or demographic variables known to affect lung cancer risk. We did, however, include additional clinical variables known to strongly influence the risk of lung cancer (age and smoking history in pack-years) and found that these variables did not improve the performance of the model. Finally, it is unclear whether our model will extend to other lung nodule cohorts, such as incidentally-detected lung nodules. Future validation of our model in other settings is indeed warranted.
Finally, it should be noted that all lung cancer cases suitable for analysis from the NLST were included in our study, some of which were at advanced stage (see Table 2). This could potentially limit the external validity of our model when applied to indeterminate pulmonary nodules. However most of the included cancer cases were stage I which should mitigate this risk.
In summary, we present a promising novel radiomics CT-based approach to lung nodule classification, which we believe could revolutionize our approach to screen-detected indeterminate pulmonary nodules and mitigate the risks inherent in lung cancer screening by minimizing unnecessary mortality, morbidity, radiation exposure, patient anxiety and healthcare costs.
Supporting information S1 File. Figure A. Analysis of the CALIPER texture features within the lung nodules. The texture features within the shaded region do not appear within the lung nodules.  Table A Algorithmic components of nodule surface characterization and the strategy used during the pilot study and current improvements. Table B. List of quantitative metrics used in the discrimination of benign and malignant nodules. The pval, 95% CI and the probability plot correlation coefficient (PPCC) are given in the last column for benign (N = 319) and malignant (N = 338) nodules. Figure C Mosaic showing the glyphs (A, D), the nodule distribution within the upper, middle, lower left and right lung (B, E) and the Score Indicative of Lesion Abnormality (SILA) for the NLST malignant and benign nodules used in this study. The glyphs are ordered in Panels A and D based on the nodule-specific SILA values; the SILA values in Panels C and F are color coded in green, yellow and red based on the previously developed CANARY categorization.