Respiration-Averaged CT for Attenuation Correction of PET Images – Impact on PET Texture Features in Non-Small Cell Lung Cancer Patients

Purpose We compared attenuation correction of PET images with helical CT (PET/HCT) and respiration-averaged CT (PET/ACT) in patients with non-small-cell lung cancer (NSCLC) with the goal of investigating the impact of respiration-averaged CT on 18F FDG PET texture parameters. Materials and Methods A total of 56 patients were enrolled. Tumors were segmented on pretreatment PET images using the adaptive threshold. Twelve different texture parameters were computed: standard uptake value (SUV) entropy, uniformity, entropy, dissimilarity, homogeneity, coarseness, busyness, contrast, complexity, grey-level nonuniformity, zone-size nonuniformity, and high grey-level large zone emphasis. Comparisons of PET/HCT and PET/ACT were performed using Wilcoxon signed-rank tests, intraclass correlation coefficients, and Bland-Altman analysis. Receiver operating characteristic (ROC) curves as well as univariate and multivariate Cox regression analyses were used to identify the parameters significantly associated with disease-specific survival (DSS). A fixed threshold at 45% of the maximum SUV (T45) was used for validation. Results SUV maximum and total lesion glycolysis (TLG) were significantly higher in PET/ACT. However, texture parameters obtained with PET/ACT and PET/HCT showed a high degree of agreement. The lowest levels of variation between the two modalities were observed for SUV entropy (9.7%) and entropy (9.8%). SUV entropy, entropy, and coarseness from both PET/ACT and PET/HCT were significantly associated with DSS. Validation analyses using T45 confirmed the usefulness of SUV entropy and entropy in both PET/HCT and PET/ACT for the prediction of DSS, but only coarseness from PET/ACT achieved the statistical significance threshold. Conclusions Our results indicate that 1) texture parameters from PET/ACT are clinically useful in the prediction of survival in NSCLC patients and 2) SUV entropy and entropy are robust to attenuation correction methods.


Introduction
Despite significant advances in targeted therapy [1], the prognosis of patients with non-small cell lung cancer (NSCLC) remains dismal. 18 F-FDG PET imaging plays an essential role in the diagnosis and staging of NSCLC. Recent years have witnessed an increased use of FDG PET imaging for clinical decision making [2]. Accordingly, maximum standardized uptake values (SUV max ) [3,4] and total lesion glycolysis (TLG) [5,6] have been shown to be clinically useful for predicting treatment response and clinical outcomes of NSCLC patients. Tumor heterogeneity has been generally related to poor prognosis and treatment resistance [7]. Growing evidence indicates that FDG PET texture features reflecting tumor heterogeneity may predict therapeutic response and survival in NSCLC [8][9][10][11] and numerous other malignancies [12][13][14][15][16][17][18][19][20]. The mining of a large number of quantitative image features has been referred to "radiomics" and holds promise as a method for identifying specific prognostic signatures [10,[21][22][23][24][25][26]. In this scenario, accurate and reproducible measurements of texture features are essential for clinical use. Unfortunately, respiratory motion during PET/CT results in degradation of image quality and hampers the correct quantification of imaging parameters [27][28][29]. Such deterioration is caused by the discrepancy of chest position between helical CT (HCT) and PET images [30,31]. Rather than a snapshot of the respiration cycle (as in HCT images), PET scans are the results of the average of respiratory cycles. Notably, the temporal difference between PET and CT ultimately introduces misalignment artifacts in PET images. To address this issue, attenuation correction of PET images with respiration averaged CT (ACT) has been utilized. Rather than providing a motionless CT image, ACT simulates the PET acquisition process by averaging the signal during a respiratory cycle from multiple low-dose cine CT images. Previous studies have shown that ACT correction can improve the quality of PET images by reducing misalignments and optimizing the quantification of SUV [31][32][33][34]. Nevertheless, the utility of ACT for correcting PET texture features has not been thoroughly investigated. In addition, data on the potential prognostic impact of different PET texture features in NSCLC patients remain scarce. Because of the increasing use of 18 F-FDG PET/CT in clinical trials, an analysis of the variability related to attenuation correction is worthy of investigation. We therefore designed the current study to compare attenuation correction of PET images with HCT and ACT in NSCLC with the goal of investigating the impact of respiration-averaged CT on FDG PET texture parameters.
ACT. The study population consisted of patients with pathologically-proven NSCLC who were scheduled to undergo definitive treatment with curative intent. All of the study participants underwent 18 F-FDG PET for disease staging. The treatment approach was as follows: 1) radical surgery for stage IA patients, 2) radical surgery plus adjuvant chemotherapy for stage IB and IIA patients, 3) neoadjuvant therapy (chemotherapy, radiotherapy, or concurrent chemoradiotherpy [CCRT]) and operation for stage IIB and IIIA patients, and 4) chemotherapy, radiotherapy, or CCRT and operation (in presence of resectable disease) for stage IIIB patients. Patients with M1 disease were excluded. Patients were staged according to the 2010 (7 th edition) American Joint Committee on Cancer (AJCC) staging system. We retrospectively reviewed the clinical charts to extract the general characteristics and the clinical outcomes of the participants. Disease-specific survival (DSS) − defined as the time from diagnosis to NSCLC-related death − served as the main outcome measure.

PET/CT imaging protocol
Patients were asked to fast 6 h before examination. According to our institutional policy, patients with blood glucose levels greater than 200 mg/dL had their scan rescheduled. All participants were imaged using the same PET/CT scanner (Discovery ST16, GE Healthcare). Scans were obtained 50 min after intravenous FDG administration. The injected dose of FDG was calculated according to body weight and ranged between 370 and 555 MBq. HCT data were acquired with the following settings: 120 kV, automatic mA (range: 10−300 mA), pitch 1.75:1, collimation 16×3.75 mm, and rotation cycle 0.5 s. Whole-body PET emission scans were performed in the 2D mode and acquired from the skull to the mid-thigh. Following HCT and PET acquisition, a low-dose cine CT was performed using the following settings: 120 kV, automatic mA (range: 10−25 mA according to the patient's body weight), rotation cycle 0.5 s, collimation 8×2.5 mm, and cine duration 5.9 s. The goal was to include lung fields bilaterally from the pulmonary apex to the dome of the liver [32,[35][36][37]. Ten phases of cine CT images were averaged to obtain ACT. No intravenous contrast enhancement was used, and imaging was performed in the free-breathing state. No pre-or in-scan breathing coach for respiratory control was used. Attenuation correction of PET images was performed with both HCT and ACT using the same PET data [30,32,35]. PET emission data were reconstructed with attenuation correction using both HCT and ACT attenuation maps. Transaxial emission images were reconstructed using ordered subsets expectation maximization (OSEM) with 4 iterations and 10 subsets. PET images were reconstructed on a 128 × 128 image matrix with a voxel size of 4.46 × 5.46 × 3.27 mm 3 for both PET/HCT and PET/ACT. An additional dose of 2.5 mSv (5 mGy) was used for patients with a body weight > 70 kg [38].

PET/CT image data analysis
The PMOD 3.3 software package (PMOD Technologies Ltd, Zurich, Switzerland) was used for tumor segmentation. We applied two methods for tumor segmentation, i.e. (1) the adaptive threshold approach in the exploratory analysis and (2) 45% of SUV max (T45) for validation purposes. The adaptive threshold was determined by using a mean intensity of voxel contoured by 70% of the tumor SUVmax (I mean 70% ) plus the background mean SUV [39]. The aortic arch was used for background measurement and none of the study participants had aortic arch invasion. Two authors (N.M.C. and T.C.Y.) contoured the aortic arch using 1 cm 3 cubic volumes of interest (VOI) and results were averaged. Care was taken to exclude calcified regions in the aortic arch. Finally, the adaptive threshold was calculated according to the following formula: 0.15 × (I mean 70% ) + background. The T45 approach has been previously utilized for delineation of NSCLC tumors [8]. SUV max , mean SUV, and texture features were determined using the tumor VOI. TLG was calculated as follows: TLG = mean SUV × metabolic tumor volume (MTV) [40]. We did not analyze nodal lesions because of their small size.
Histogram analysis, normalized grey-level co-occurrence matrix (NGLCM) [41,42], neighborhood grey-tone difference matrix (NGTDM) [43], and grey level size zone matrix (GLSZM) [44] were applied for calculation of PET texture features. Because numerous texture features have been reported [12,45,46], we specifically focused on those utilized for predicting survival in patients with malignancies. Several texture parameters, including entropy, uniformity, homogeneity and dissimilarity from NGLCM [11,14,47,48], greylevel nonuniformity (GLNU), zone-size nonuniformity (ZSNU), high grey-level large zone emphasis (HGLZE) from GLSZM [14,19], and NGTDM based coarseness, busyness, contrast and complexity [8] have been used for survival prediction in patients with NSCLC, esophageal cancer, and head and neck malignancies. In addition, we evaluated SUV entropy based on histogram analysis because of its robustness due to different reconstruction settings [45,49]. A total of 12 different texture features were examined. The intensity values of the recorded VOIs were initially resampled into 64 bins to normalize images and reduce noise for the calculation of texture features [13]. The computations for texture features were performed using the Chang-Gung Image Texture Analysis toolbox (CGITA) implemented under MATLAB 2012a (Mathworks Inc., Natick, MA, USA). The details on mathematical models for texture matrices and the calculation process have been previously described in detail [14,50].

Statistical analysis
Because most texture features showed a skewed distribution, the non-parametric Wilcoxon signed-rank test was used for paired comparisons of PET/HCT and PET/ACT parameters. The reciprocal associations of texture features in PET/HCT and PET/ACT were examined using intraclass correlation coefficients (ICC). Precision was defined by half of the width of the 95% confidence intervals (CIs) and used as an indicator of reliability. Bland-Altman analysis was used for comparing two measurements. The differences between the two parameters (i.e., PET/ ACT values minus PET/HCT values) were plotted against their average (e.g. mean of PET/ HCT and PET/ACT values) and reported as percentage. The lower and upper reproducibility limits (LRL and URL, respectively) were calculated as ± 1.96 standard deviations (SD). Variations were defined as the range between LRL and URL. In an exploratory analysis, the evaluation of texture parameters was based on a step-forward process. The median follow-up time in the entire study cohort was 26.2 months (range: 2.5−74.8 months), whereas it was 59.0 months (range: 40.4−74.8 months) in patients who survived. Because all of the enrolled cases were followed up of at least 3 years or until death, receiver operating characteristic (ROC) curves were initially used to identify the image features associated with 3-year DSS. All of the parameters with an area under curve different from 0.5 were selected for further analyses. The optimal cutoff values were identified by determining the point where the sum of sensitivity and specificity (Youden's index) was maximum. Dichotomizing patients according to the optimal cutoff values were used in subsequent univariate and multivariate Cox regression analyses. Because of the high collinearity among different texture features, we constructed multivariate Cox regression models to include only one texture parameter and the following covariates: age, cell type (adenocarcinoma vs. non-adenocarcinoma), AJCC stage (stage I, II vs. stage III), and radical surgery (yes vs. no). All calculations were performed with the PASW Statistics 18 software package (SPSS Inc., Chicago, IL, USA). After application of the Bonferroni correction, a P value < 0.017 (i.e., 0.05/3) was considered statistically significant.

Results Patients
Between July, 2007 and June, 2009, a total of 56 consecutive patients (36 males, 20 females; median age: 68 years; age range: 34−84 years) were enrolled. The median follow-up time in the entire study cohort was 26.2 months (range: 2.5−74.8 months), whereas it was 59.0 months (range: 40.4−74.8 months) in patients who survived. The clinical characteristics of the study participants are shown in Table 1. The most common histological type was adenocarcinoma (n = 31, 55.4%), and the majority of patients were diagnosed at advanced stages (stage IIIA or IIIB, n = 36, 64.2%). Twenty-two (39.3%) tumors were located in the lower pulmonary lobes, whereas 34 (60.7%) were located in the upper or right middle lobes. A total of 29 patients (51.8%) received radical surgery. The median glucose level before FDG PET imaging was 93 mg/dL (range: 65−151 mg/dL). There were no significant interobserver differences in background activity for both PET/HCT (observer 1 vs.

PET image analysis
The results of Wilcoxon sign-ranks tests revealed that PET/ACT yielded significant higher SUV max , SUV mean, and TLG values. However, all of the texture parameters did not show significant differences (Table 2). Specifically, SUV max of tumors located in the lower lung were significantly higher in PET/ACT, but similar values were noted for tumors arising in other sites (S1 Table). Despite differences in SUV max , SUV mean, and TLG between PET/HCT and PET/ ACT, ICC analysis revealed a high degree of correlation and good precision (ICC: 0.993, 0.994, and 0.993; precision: 0.35, 0.35, and 0.40% for SUV max , SUV mean, and TLG, respectively). HGLZE showed the lowest levels of correlation and precision (0.919 and 4.35%, respectively). High correlation coefficients (ICC > 0.95) were generally noted for other texture features ( Table 3). The variations of SUV max and SUV mean in Bland-Altman analysis were 25.4% and 18.1%, respectively. The lowest level of variation was evident for SUV entropy (9.7%) followed by entropy (9.8%), as revealed in Fig 1. Among NGTDM and GLSZM parameters, coarseness and GLNU had the lowest degree of variation (33.0% and 45.2%, respectively). The highest levels of variation were noted for contrast (104.9%) and HGLZM (80.6%), S1 Fig. Variation   used to investigate the prognostic role of texture parameters. The results of ROC curve analyses (S2 Table) revealed that SUV entropy, uniformity, entropy, coarseness, contrast, GLNU, and ZSNU from PET/HCT and PET/ACT were statistically significant. When patients were dichotomized according to the optimal cutoff values, we found a high consistency of texture parameters derived from PET/HCT and PET/ACT. Identifcal stratification results were obtained with regard to entropy and coarseness; 55 of 56 cases (98.2%) were based on the cut-off for GLNU; 54 (96.4%) for ZSNU; 53 (94.6%) for uniformity and 50 (89.3%) for SUV entropy. ICC, precision, and variation did not show significant associations with stratification consistency (Spearman's ρ = -0.211, 0.266 and -0.248, P = 0.559, 0.457 and 0.490, respectively). Subsequently analyses using univariate and multivariate Cox models confirmed the significant role of SUV entropy, entropy, and coarseness from both PET/HCT and PET/ACT in the prediction of DSS ( Validation data MTV and TLG delineated using the T45 method were significantly lower than those obtained using the adaptive threshold approach (Wilcoxon signed-ranks test, both P <0.001). Significant higher SUV max , SUV mean and TLG were also evident (S3 Table), whereas all of the texture parameters showed no statistically significant differences. High ICCs were identified. SUV entropy and entropy revealed the lowest degrees of variation between PET/ACT and PET/ HCT, whereas contrast and HGLZE continued to show large variations (S4 Table). Using the T45 method for validation purposes, the predictive role of SUV entropy and entropy from both PET/HCT and PET/ACT for DSS was confirmed using all of the three methodologies (ROC curves as well as univariate and multivariate Cox regression analyses). However, the role of coarseness remained significant only from PET/ACT (Table 5). Kaplan-Meier estimates of DSS for these analyses are reported in Fig 3.

Discussion
The quantification of PET images relies on accurate attenuation correction maps. However, respiration motion continues to remain a major challenge for PET/CT imaging. PET/HCT misregistration occurs when HCT imaging is performed during the inspiration [27][28][29]51]. Accordingly, a displaced diaphragm by air-filled lung tissues results in an underestimation of the attenuation coefficient. Because PET/ACT can effectively reduce this issue, higher SUV max values were identified in PET/ACT (especially for lesions located in the lower where a more marked respiratory motion was expected). In contrast, SUV max values did not differ significant at sites different from the lower and background. Notably, texture indices obtained from PET/ ACT and PET/HCT were largely similar even in the lower lung fields. Texture features are calculated using whole-tumor sampling and are useful for assessing the relationships between multiple voxels and their neighborhood (rather than a single voxel). Therefore, they are generally consistent even when different attenuation correction methods are used.  PET entropy has been shown to predict survival in patients with early-stage NSCLC [48]. In the current study, we were not only able to replicate this finding but we also showed that heterogeneous PET images were associated with unfavorable DSS. Heterogeneous images were characterized by larger values of SUV entropy and entropy. Notably, SUV entropy and entropy based on NGLCM from both PET/HCT and PET/ACT were significant predictors of survival. Moreover, SUV entropy and entropy remained consistent regardless of different PET reconstruction parameters (iteration number, FWHM, and pixel size) [49] and attenuation correction methods. NGTDM was originally developed to quantify human visual perception. A coarse image reflects the presence of a uniform intensity distribution, e.g. a homogeneous image. Although a previous study demonstrated a prognostic role of coarseness [8], this parameter shows a high extent of variation according to different segmentation methods, PET acquisition modes, and image reconstruction settings [45,49,52,53]. As expected, coarseness values in this study were characterized by a marked extent of variation according to the attenuation correction method (33.0% and 55.0% for the adaptive threshold and T45 methods, respectively). Notably, coarseness was significantly associated with DSS using the adaptive threshold method. However, only a marginal association was observed when the T45 delineation without motion compensation was applied. In contrast, SUV entropy and entropy (which were characterized by a lower extent of variation) were significant in both PET/ACT and PET/ HCT. Different from our results, Yip et al. [54] reported significant differences between PET/HCT and 4D PET/CT for coarseness and busyness values. In 4D-PET/CT, 4D-CT images of five different respiratory phases are selected to match those of the corresponding 4D-PET acquired following PET/HCT. Consequently, the count of 4D-PET photons is different from that of PET/HCT, with a higher noise being evident for 4D PET (which may hamper the precise calculation of the texture features). The question as to whether such differences may have an impact on survival prediction remains open. Differently from 4D-PET/CT, PET/ACT uses the same PET images and all of the phases of PET signals are utilized. This observation may explain the limited differences in terms of texture parameters between PET/HCT and PET/ACT.
Our findings emphasize the importance of using a standardized approach for PET texture analysis [55] in NSCLC patients. It may be argued that the diversity of PET texture parameters (resulting from differences in target segmentation, rebin process, reconstruction settings, and/ or terminology) may hamper the application of texture-based analysis in clinical practice [53]. However, technical advances and the implementation of randomized clinical trials will hopefully be helpful in overcoming such challenges [2,56]. It is also possible that texture parameters (especially SUV entropy and entropy) can be useful in guiding radiotherapy. In this regard, it would be clinically relevant to assess the value of dose painting using unfixed radiation dose distribution to the tumor (based on image-guided stratification of high-risk target volumes). Although dose painting based on PET images with high SUV does not seem to be clinically useful for predicting outcomes [57], the identification of high-risk subvolumes based on PET texture parameters or multiparametric imaging [58] may warrant further investigation. Several limitations of our study merit comment. Because of the retrospective nature of the study, a selection bias cannot be excluded. Our results related to the predictive value of SUV entropy and entropy need to be independently validated in longitudinal studies. Another caveat inherent in our study is the use of two methods (i.e., adaptive threshold and T45) for tumor delineation. Further studies are necessary to clarify the potential impact of the tumor delineation method on the prognostic significance of the indexes. Finally, we did not  Texture Parameters of SUV Entropy and Entropy Are Robust to Attenuation Correction Methods specifically investigate the impact of the reconstruction algorithms on the texture features [45] and their effect on survival.

Conclusions
The results of our study indicate that texture features obtained with PET/HCT and PET/ACT showed limited differences and good levels of agreement regardless of the delineation method used. We also showed that texture parameters from PET/ACT are clinically useful in the prediction of survival in NSCLC patients and that SUV entropy and entropy are robust attenuation correction methods.