Estimation of lung cancer risk using homology-based emphysema quantification in patients with lung nodules

The purpose of this study was to assess whether homology-based emphysema quantification (HEQ) is significantly associated with lung cancer risk. This retrospective study was approved by our institutional review board. We included 576 patients with lung nodules (317 men and 259 women; age, 66.8 ± 12.3 years), who were selected from a database previously generated for computer-aided diagnosis. Of these, 283 were diagnosed with lung cancer, whereas the remaining 293 showed benign lung nodules. HEQ was performed and percentage of low-attenuation lung area (LAA%) was calculated on the basis of computed tomography scans. Statistical models were constructed to estimate lung cancer risk using logistic regression; sex, age, smoking history (Brinkman index), LAA%, and HEQ were considered independent variables. The following three models were evaluated: the base model (sex, age, and smoking history); the LAA% model (the base model + LAA%); and the HEQ model (the base model + HEQ). Model performance was assessed using receiver operating characteristic analysis and the associated area under the curve (AUC). Differences in AUCs among the models were evaluated using Delong’s test. AUCs of the base, LAA%, and HEQ models were 0.585, 0.593, and 0.622, respectively. HEQ coefficient was statistically significant in the HEQ model (P = 0.00487), but LAA% coefficient was not significant in the LAA% model (P = 0.199). Delong’s test revealed significant difference in AUCs between the LAA% and HEQ models (P = 0.0455). In conclusion, after adjusting for age, sex, and smoking history (Brinkman index), HEQ was significantly associated with lung cancer risk.


Introduction
Lung cancer is the leading cause of cancer-related deaths in the United States [1]. The National Lung Screening Trial has demonstrated that screening high-risk individuals using low-dose computed tomography (CT) reduced lung cancer-related mortality by 20% [2], fostering a hope that the detection of early-stage lung cancers may enable the administration of curative treatments. Moreover, of the 90 million current and former smokers in the United States, 9 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 million have been estimated to meet the criteria for undergoing CT screening [3]. Considering such a large number of potential screening participants, screening costs may be a major problem while determining inclusion criteria for the current screening programs.
To limit screening costs, predictive models for lung cancer risk have been investigated in previous studies [4], [5]. In addition, it has been suggested that lung cancer risk could be stratified using spirometry measurements and CT-based emphysema evaluations [3]. Several studies have examined the association between lung cancer risk and CT-based emphysema evaluation [6], [7]. A meta-analysis has confirmed significantly increased odds ratio (OR) for lung cancer when emphysema was detected through a visual assessment [7]. However, other studies have shown no such association between quantitative emphysema evaluations and lung cancer risk [6], [7], [8]. Wille et al [8] have investigated the association between lung cancer and visual or quantitative chest CT image assessments and have confirmed that neither percentage of low-attenuation lung area (LAA%) nor the 15 th percentile density was associated with lung cancer, although there was a significant association between lung cancer and visually assessed emphysema and interstitial abnormalities. Overall, the utility of quantitative emphysema evaluations remains controversial.
Homology methods have been used for medical image analysis in numerous studies [9][10][11][12][13], some of which have demonstrated that homology-based emphysema quantification (HEQ) was useful for assessing the severity of emphysema and for predicting results of visual scoring of emphysema [12], [13]; these findings suggest that HEQ may be useful for estimating lung cancer risk.
Therefore, the purpose of the present study was to evaluate whether HEQ is significantly associated with lung cancer risk and to construct and validate an estimation model for predicting lung cancer risk on the basis of HEQ and other clinical parameters. We hypothesized that HEQ is associated with lung cancer risk.

Materials and methods
This retrospective study was approved by the institutional review board of Kyoto University Hospital (number: R1054); the requirement of acquiring informed consent was waived. We used a database of lung nodules, which was previously generated for computer-aided diagnosis [14], [15]. The database includes CT images and clinical information of 1,240 patients presenting with at least one lung nodule. The previous studies have focused on the computer-aided diagnosis system [14], [15], which directly uses characteristics related to lung nodules; thus, the purpose of the current study stands different from that of the previous studies.

Database and inclusion criteria
A majority of the lung nodules in the database were diagnosed as one of three types: benign lung nodule, primary lung cancer, or metastatic lung cancer. In the present study, we focused on benign lung nodule and primary lung cancer. Diagnoses of all lung cancers were pathologically confirmed. The diagnosis of benign lung nodules was based mainly on their stability or shrinkage on CT scans, with the stability confirmed by a 2-year follow-up with CT; 57 of the benign nodules were pathologically confirmed. CT scans covered the entire chest and were acquired using a 320-or 64-detector row CT scanner (Aquilion ONE or Aquilion 64; Toshiba Medical Systems, Otawara, Japan), with automated exposure control. Parameters of CT scans were as follows: tube current, 109 ± 53.3 (range, 25-400) mA; gantry rotation time, 0.500 ± 0.0137 (range, 0.400-1.00) s; tube potential, 120 ± 1.69 (range, 120-135) kV; matrix size, 512 × 512; and slice thickness, 1 or 0.5 mm.
Patients who met the following three criteria were selected: (1) those with the lung nodule diagnosed as benign or primary lung cancer; (2) those whose non-contrast CT scans were available; and (3) those for whom smoking history (Brinkman Index) was clearly described.

Emphysema quantification
The lungs were automatically segmented based on chest CT images using a dedicated algorithm [16], and three CT images of the upper, middle, and lower lung fields were selected for emphysema quantification [13], [17].
LAA% was calculated as follows. The lung area was evaluated (as a number of pixels) based on the lung segmentation of the three CT images, and the pixels within the lungs with attenuation lower than a predefined threshold were counted as low-attenuation lung pixels [18]; these values were used to calculate LAA%:

LAA% ¼
Total number of low À attenuation lung pixels on the three CT scans Total number of lung pixels on the three CT images : During this process, binary versions of the CT images were created, with 1 indicating a normal lung pixel or a pixel outside the lung and 0 indicating a lung pixel with attenuation below the defined threshold. These binary images were used for HEQ.
HEQ for the three CT images was performed as described elsewhere [12], [13]. The detailed process of HEQ in the present study has been described in supporting information (S1 File). The two previous studies have used the Betti numbers for HEQ [12], [13]. In a two-dimensional image, the Betti numbers of homology comprise two numbers: b 0 and b 1 . In terms of lung CT images, b 0 corresponds to the number of low-attenuation lung regions and b 1 to the number of normal lung regions surrounded by the low-attenuation lung regions. On CT images, b 0 and b 1 are related to the holes formed by emphysema. Examples of binary images and corresponding Betti numbers are shown in supporting information (S2 File). Using dedicated software [12], [13], the Betti numbers can be calculated from the binary CT images acquired when calculating LAA%. Thresholds for both LAA% and HEQ were −950, −910, and −880 Hounsfield unit (HU).

Statistical analysis
Differences in age, sex, smoking history (Brinkman index), malignant tumor history, lung area, LAA%, and HEQ (b 0 and b 1 ) were compared between the patients with and without lung cancer using chi-squared tests or t-test to investigate the association between lung cancer and these parameters. Furthermore, an estimation model for lung cancer risk was built using logistic regression. Before constructing the model, the best threshold for LAA% and HEQ was selected on the basis of results of t-tests. The statistical models included sex, age, smoking history (Brinkman index), LAA%, and HEQ as independent variables. The following three statistical models were evaluated: the base model (sex, age, and smoking history); the LAA% model (base model + LAA%); and the HEQ model (base model + HEQ). Model performance was assessed using the Akaike information criterion (AIC), analysis of receiver operating characteristic analysis, and the associated areas under the curves (AUCs). Difference in AUCs between the models was evaluated using Delong's test. In addition, 10-fold cross validation was performed for the models to validate their robustness. Finally, the variable HEQ was binarized based on the empirically determined threshold, and a second HEQ model was constructed (HEQ b ). The OR of the HEQ b model was calculated to interpret the association between HEQ and lung cancer risk. P-values of <0.05 were considered significant. All analyses were performed using R-3.3.2 (available at http://www.r-project.org/).

Fig 1 presents the patient selection process.
A total of 576 patients (317 men and 259 women) were included, of which 283 were diagnosed with lung cancer and 293 with benign lung nodule. Mean (± standard deviation) patient age of 66.8 ± 12.4 years; mean Brinkman Index (representing the smoking history) was 647 ± 829.
Mean HEQ values at the three thresholds were as follows: −950 HU, b 0 7770 ± 3100, b 1 4930 ± 3250; −910 HU, b 0 3760 ± 2470, b 1 7300 ± 3010; and −880 HU, b 0 2030 ± 1850, b 1 7470 ± 2410. Table 2 summarizes the results of univariate statistical analysis. Age, smoking history (Brinkman Index), b 1 at −910 HU, b 0 at −880 HU, and b 1 at −880 HU significantly differed between patients with and without lung cancer. Conversely, sex, malignant tumor history, lung area, and LAA% at the three thresholds did not show statistically significant differences. Based on results in Table 2, −880 HU was selected as the best threshold for both LAA% and HEQ, and LAA% at −880 HU and b 1 at −880 HU were used for the model construction.
The results of the three models are summarized in Table 3. AUCs were as follows: the base model, 0.585; the LAA% model, 0.593; and the HEQ model, 0.622. Although sex was not a significant parameter in the base and LAA% models, AIC was lower when sex was included in the two models. LAA% coefficient at −880 HU was not significant in the LAA% model (P = 0.199). Conversely, the coefficient of b 1 at −880 HU was statistically significant in the HEQ model (P = 0.00487). Delong's test revealed significant difference in AUC between the LAA% and HEQ models (P = 0.0455). The receiver operating characteristic curves for the three models are shown in    Estimation of lung cancer risk using homology Table 4 shows the results of the HEQ b model, which used binarized values for b 1 at −880 HU. Before constructing this model, the value of b 1 at −880 HU was replaced by 1 when it was larger than 5100 or by 0 otherwise. The AUC for the HEQ b model was 0.622 without 10-fold cross validation and 0.602 with 10-fold cross validation. OR (95% confidence interval) for b 1 as a binary variable at −880 HU was 2.28 (1.43-3.73).  Estimation of lung cancer risk using homology

Discussion
In this study, we evaluated the association between HEQ or LAA% and lung cancer in patients with lung nodules. After adjusting for age, sex, and smoking history (Brinkman index), HEQ was significantly associated with lung cancer, while LAA% was not. Moreover, our findings indicated that the HEQ model was more effective at estimating lung cancer risk than were the base and LAA% models. In the HEQ b model, OR (95% confidence interval) for the binarized b 1 at −880 HU was 2.28 (1.43-3.73), indicating that on average, the odds of developing lung cancer in patients with high b 1 (>5100) at −880 HU were higher by a factor of e 2.28 � 9.78 than the odds of developing lung cancer in patients with low b 1 (�5100).
A previous meta-analysis [7] has shown that although visually assessed emphysema using CT was independently associated with lung cancer risk, automated emphysema detection (including LAA%) was not; the pooled ORs (95% confidence intervals) were as follows: visual assessment of emphysema, 3.50 (2.71-4.51) and automated emphysema detection, 1.16 (0.48-2.81). Gietema et al have shown that for moderate-to-severe emphysema visualized on CT, the visual assessment tended to overestimate the extent of emphysema compared with LAA% at −950 HU [19]. Conversely, for smaller amounts of emphysema, the radiologists tended to underestimate the extent of emphysema compared with LAA% at −950 HU. Wilson et al have suggested that automated densitometry for emphysema evaluation was rather sensitive to distinguish clinically meaningful emphysema with respect to lung cancer risk [20]. On the basis of these results, we hypothesized that emphysema quantification should strongly correlate with visual assessment when emphysema quantification is used for the estimation of lung cancer risk.
A previous study has investigated the association of emphysema assessed by LAA% and HEQ with the visual assessment of emphysema [13] and has shown that LAA% at −875 HU and HEQ at −875 HU were strongly associated with the visual assessment values. Therefore, we used the threshold of −880 HU for emphysema quantification in the present study in addition to −950 and −910 HU. This threshold (−880 HU) is not the one typically used for LAA% in the literature, and it is difficult to compare our results of LAA% with the results of other studies.
Another previous study has shown that HEQ was useful for evaluating the spatial distribution of low-attenuation lung regions in patients with and without chronic obstructive pulmonary disease [12]. Gietema et al have shown that the visual assessment of emphysema was affected by both LAA% and the spatial distribution of low-attenuation lung regions [19]; therefore, we speculated that HEQ could be more useful for the estimation of lung cancer risk than LAA%. Our results validated this speculation, showing that b 1 at −880 HU was significantly associated with lung cancer risk in the HEQ and HEQ b models.
Similar to the previous meta-analysis [7], our results showed no statistically significant difference in the association between lung cancer risk and LAA% at the three thresholds. However, the association between LAA% and lung cancer risk improved when a relatively high threshold (−880 HU) was set. We speculated that LAA% using a higher threshold might be significantly associated with lung cancer risk. However, we did not pursue this speculation in the present study, which was designed to evaluate the association between HEQ and lung cancer risk.
This study had several limitations. First, our results were obtained retrospectively using a CT database of patients with lung nodules, who visited a single hospital [14], [15]. The patient demographics and prevalence of lung cancer in the database were different from those of patients undergoing CT screening. Frequencies of lung nodules and lung cancers in the present study were evidently different from those in CT screening; in our study, the frequencies of clinically meaningful lung nodules and lung cancers were quite high, and lung cancer prevalence was 49.1%. Therefore, our results must be evaluated in another screening population. Second, the database did not include low-dose CT. Although automated exposure control was used for scanning in our database, the radiation exposure was higher than that in CT screening. Previous studies have investigated effects of low-dose CT and iterative reconstruction on emphysema quantification, particularly on the size distribution of low-attenuation lung regions. Therefore, with iterative reconstruction, acceptable agreement in emphysema quantification between low-dose CT and standard-dose CT could be obtained [21]. Hence, we expect that the results of the present study can be replicated when low-dose CT images are reconstructed with iterative reconstruction. Third, adjustments for cofounders were limited in the present study. Only patient age, sex, and smoking history (Brinkman Index) were adjusted in the models. Previous studies have shown that the results of pulmonary function tests and clinical diagnosis of chronic obstructive pulmonary disease were significantly associated with lung cancer risk [6], [22]. In addition, several risk prediction models were investigated in previous studies [4], [5]. In a future study, we aim to explore the added value of HEQ for estimating lung cancer risk when used in combination with these cofounders and models. Finally, nodule features (such as shape and size) were not evaluated since we focused on the usefulness of emphysema quantification for estimating lung cancer risk. Combined use of HEQ and nodule features may lead to a better model for estimating lung cancer risk although such a model can only be used when lung nodules are detected.

Conclusions
After adjusting for age, sex, and smoking history (Brinkman index), HEQ was significantly associated with lung cancer risk, and HEQ can potentially allow the stratification of lung cancer risk.