Are Quantitative Features of Lung Nodules Reproducible at Different CT Acquisition and Reconstruction Parameters?

Consistency and duplicability in Computed Tomography (CT) output is essential to quantitative imaging for lung cancer detection and monitoring. This study of CT-detected lung nodules investigated the reproducibility of volume-, density-, and texture-based features (outcome variables) over routine ranges of radiation-dose, reconstruction kernel, and slice thickness. CT raw data of 23 nodules were reconstructed using 320 acquisition/reconstruction conditions (combinations of 4 doses, 10 kernels, and 8 thicknesses). Scans at 12.5%, 25%, and 50% of protocol dose were simulated; reduced-dose and full-dose data were reconstructed using conventional filtered back-projection and iterative-reconstruction kernels at a range of thicknesses (0.6-5.0 mm). Full-dose/B50f kernel reconstructions underwent expert segmentation for reference Region-Of-Interest (ROI) and nodule volume per thickness; each ROI was applied to 40 corresponding images (combinations of 4 doses and 10 kernels). Typical texture analysis metrics (including 5 histogram features, 13 Gray Level Co-occurrence Matrix, 5 Run Length Matrix, 2 Neighboring Gray-Level Dependence Matrix, and 2 Neighborhood Gray-Tone Difference Matrix) were computed per ROI. Reconstruction conditions resulting in no significant change in volume, density, or texture metrics were identified as"compatible pairs"for a given outcome variable. Our results indicate that as thickness increases, volumetric reproducibility decreases, while reproducibility of histogram- and texture-based features across different acquisition and reconstruction parameters improves. In order to achieve concomitant reproducibility of volumetric and radiomic results across studies, balanced standardization of the imaging acquisition parameters is required.


I. INTRODUCTION
UNG nodules have traditionally been evaluated with twodimensional (2-D) linear measurements on chest Computed Tomography (CT) (e.g., Response Evaluation Criteria in Solid Tumors (RECIST) [1]). However, threedimensional (3-D) volumetric assessments of lung nodules are gaining importance due to: 1. improved representation of disease extent and therapeutic-response; and 2. less userdependency and higher reproducibility of the results [2]. Concurrently, new diagnostic and treatment paradigms increasingly emphasize the value of quantitative radiomic features of lung nodules and surrounding lung tissue as indicators of tumor type, aggressiveness, and/or responsiveness to treatment [3], [4]. The associated quantitative metrics assume adequacy and uniformity in CT data acquisition and reconstruction despite well-recognized wide inter-scan variability in: 1. examination protocoling for image acquisition; 2. image reconstructions and displays; and 3. CT equipment and technologist capabilities and performances. Although the susceptibility of several lung nodule measurements (volumetric or radiomic) to variations in individual CT acquisition parameters has already been recognized based on preliminary studies of clinical or phantom data [5]- [8], the concomitant effects that routine ranges of radiation-dose, reconstruction kernel, and slice thickness have on nodule volume and texture features have not yet been reported.
The purpose of this research was to: 1. determine the impact of a variety of standard imaging acquisition and reconstruction parameters on lung nodule volumes and radiomic features; 2. identify potential imaging acquisition parameters that allow consistency and reproducibility of volumetric and/or radiomic features of lung nodules.
For each chest CT examination, sets of 100%-dose/FBP B50f-kernel images were reconstructed at the aforementioned 8 different slice thicknesses, with nodule extent and shape previously independently confirmed by consensus between the two dedicated thoracic radiologists. This resulted in 8 reference Region-Of-Interest (ROI) stacks, each corresponding to a specific slice thickness, which were then applied to the 40 different dose-kernel combinations (i.e., 4 doses x 10 kernels) at constant slice thickness (Fig. 2).

C. Analysis of Reproducibility of Volumetric Measurements
Using automatic 3-D segmentation, 8 volumetric measurements were obtained for each nodule (Fig. 3.a) corresponding to each slice thickness ( Fig. 3.b) and normalized by their averages using (1) as follows (V i : nodule volume for i-th thickness): Using the distributions of normalized volumes for each slice thickness (Fig. 3.c), t statistics were calculated using 2-tailed t-test (2) to evaluate the compatibilities of slice thicknesses based on volumetric measurements. If t<1.96 (P<0.05), slice thicknesses are accepted as compatible with 95% confidence interval; 1 , 2 : means; 1 , 2 : standard deviations; 1 , 2 : number of samples (nodules):   (2)

E. Compatibility of Image Reconstruction Conditions Based on Radiomic Features
Means and standard deviations of the aforementioned 28 image features were computed for each of the 320 segmented images of each nodule. Statistically significant changes in image features were evaluated using 2-tailed t-test. Reconstruction condition pairs that satisfy t values of lower than <1.96 (P<0.05) are accepted as compatible with 95% confidence interval. Compatibility ratios were calculated using We produced a compatibility map of reconstruction conditions ( Fig. 4) to highlight effects of changes in slice thickness (T), kernel sharpness (K), and dose (D). Reconstruction condition parameters are sorted based on total number of compatibilities. Thickness order: 5mm, 4mm, 3mm, 2mm, 1.5mm, 1mm, 0.75mm, 0.6mm. Kernel order: I26f, I31f, B31f, I40f, B40f, I50f, B50f, I70f, B60f, B70f. Dose level order: 100%, 50%, 25%, 12.5%.

A. Reproducibility of Volumetric Measurements
Volume comparisons were performed on 8 slice thicknesses on all 23 cases (Fig. 5). The slice thickness resulting in the least amount of volumetric variability was 2-mm, with +0.39%±1.59 (mean±SD) variation from average volume. On Changing reconstruction parameters (thickness/dose/kernel-sharpness) decreases the compatibility. In order to obtain higher compatibility, changes to the reconstruction parameters should be applied carefully. For example if thickness needs to be switched from 2 mm to 0.75 mm, softer kernels and/or higher dose levels are needed as seen from the intersection of two thicknesses.     (Fig. 6).

B. Reproducibility of Radiomic
Features 320 reconstruction conditions were compared to each other on 28 texture features and 23 cases, totaling 65,945,600 (320x320x28x23) comparisons. As shown in Fig. 4, highest average compatibility of 24.47% was achieved using the combination of highest slice thickness, smoothest kernel and highest dose level (5mm/I26f/100%). Compatibility decreases while decreasing slice thickness and/or kernel smoothness and/or dose level. Lowest average compatibility of 2.65% was at the combination of lowest slice thickness, sharpest kernel and lowest dose level (0.6mm/B70f/12.5%). Fig. 7 shows percentage compatibilities of radiomic features based on dose, kernel, and thickness changes. Most robust feature was the density against dose changes (87.45% compatibility); and skewness was most robust for kernel and slice thickness condition changes (53.73%-82.51% compatibility). Deviation was the weakest feature for all cases. In average, GLCM based feature group was the most vulnerable feature group (19.41% average compatibility). In addition, results showing percentage compatibilities of conditions based on kernel sharpness, slice thickness and dose levels are presented in Fig.  8, 9 and 10 respectively.
As it can be seen from the figures, while stable volumetric measurements can be obtained (e.g. 2mm slice thickness) and volumetric measurement errors can be predicted when slice thicknesses are changed (Fig. 5); that is not the case for radiomic features (Fig. 6-7). Texture measurements can be very unstable when conditions are altered.

IV. DISCUSSION
Robust image features are vital for designing and standardizing anatomic and radiomic-based diagnosis and prognosis decisions [9]- [13]. In this study, we have investigated the effects of image acquisition and reconstruction conditions on volume and radiomic features derived from chest CT scans of lung nodules. These conditions contained an extensive list of combinations (320 versions of each examination: 4 doses x 10 kernels x 8 thicknesses).
Lung nodule detection is enhanced when thinner CT slices are produced [14]- [16]; screening CT scans are preferentially performed with 1 to 2.5-mm-thick slices [14], [17], as was done in the National Lung Screening Trial [18]. Solid nodules > 4-mm in diameter are currently considered important [14] and were the focus of this research. Accurate and reliable measurement of lung nodule size from CT scans is a key biomarker in the diagnosis of lung cancer; the estimation of nodule growth rates serves as a predictor of malignancy and size change reflects efficacy of a treatment [19], [20].
A related challenge is the consistency in establishing lung nodule size [14]. Reliable sizing of nodules has traditionally been limited by subjectivity in selection of the desired dimensions to measure and non-uniformity in measuring by manual manipulation of digital calipers; this represents a source of disagreement between readers and reference standards [2], [14]. Manually measuring nodule size involves laboriously inspecting all images that include the lesion [19]. To provide a standard method for nodule-size measurement, the RECIST working group proposed the common use of the 1-D maximal diameter as an efficient standard estimator of lesion volume [1], [19], [21]. While such basic mass size measurements are typically used in clinical practice, 3-D volume measurements are growing in importance due to evidence that 3-D volumetry is more robust for quantifying tumor size [20], [22]. Poor agreement between 3-D and simpler methods is commonly seen when the nodule does not conform to spherical or ellipsoidal assumptions that underlie 1-D and 2-D measurements, respectively; in the context of spatial extent for a 3-D object without restrictions on shape, size is best expressed by the volume occupied by the object [19].
Consequently, there is increasing interest in computerassisted methods aiding the radiologist in measuring the size of lung nodules using volumetric methods [23]- [26], despite the fact that their calibration and validation becomes a new change [19]. Although lung-nodule volumetry has the potential to improve patient management, there is considerable and largely unpredictable variability in its execution related to produced slice thickness, reconstruction algorithm, and scan dose [20], [22], [27], [28]. Our research addressed this issue and demonstrated that while increasing thickness decreases volumetric reproducibility, it improves the reproducibility of histogram-and texture-based features across different acquisition and reconstruction parameters.
Texture analysis is promising for phenotyping and segmenting cancerous tissues [6], [7]. However, Buch et al. [6] highlighted the major looming problem pertaining to radiomics and big data, that despite collecting increasing numbers of radiological images at an exponentially growing rate, the medical field is far from a completely data-driven artificially intelligent diagnosis. The main reason is a lack of data harmony across multi-site studies, which keeps the training data substantially low for a truly large-scale study.
Recognizing that homogenizing all CT acquisition parameters across practices may not be possible, our research goal is to provide a practical approach to this issue. Based on the datamining approach we performed on our results from various parameters, our recommendations are as follows: 1) During volumetric measurements changes in slice thickness may produce acceptable errors, however the effects on texture are most dramatic. Hence, if possible, slice thickness should not be altered between studies if radiomic features are being compared between studies. 2) If scanning/reconstruction changes are inevitable, they should be limited to a single parameter. For example, only dose level or only sharpness should be changed (Fig. 8-10), and those changes should be kept to a minimum. Multiple parameter changes in general produces greater measurement errors (Fig. 4).
We performed a large-scale data mining approach for finding a "compatible" set of parameters, however, as it can be seen from the results compatibilities are very limited. In an earlier study, Young et al. [20] raised another concern pertaining to radiation dosage and kernel usage in CT lungnodule volumetry; they found that the volume of lung nodules was extremely robust to the dose and reconstruction kernels. On the other hand, Chen et al. [8], [29]- [31] showed that the choice of reconstruction algorithm slightly affects the measurement of lesion volume. Regardless, reduced dosage comes at the cost of decreased image quality, which in turn make the results less reproducible for textures. Lo et al. [7] discussed these specific issues pertaining to lung CT and lung nodule quantification. Many authors recognize that it will be desirable to reduce the dose levels among the patients, and many vendors are already working towards quantitative imaging based on reduced dose levels [32]- [35]. In addition, other reports indicated that iterative reconstruction algorithms offer an opportunity to substantially reduce radiation dosage in case of CT scans while maintaining a good image resolution for visualization and nodule detections [8], [36]- [39]. However, the quantitative measurements from iterative algorithms can be significantly different from the traditional filtered back projection algorithms due to varying noise and resolution properties [40].
Based on earlier studies, high-resolution texture characterization requires image reconstruction using thin slices. However, thin slices also increase image noise; increasing slice thickness will decrease the noise while inducing blur. Based on our results, increasing slice thickness decreases the reproducibility of volumetric measurements but increases the reproducibility of histogram-and texture-based    features. Thus, in this case, CT reconstruction becomes an optimization problem which corroborates our view based on these results that there might not be a best universal set of parameters that simultaneously covers both volumetric measures and radiomic features. Increasing slice thickness causes information loss due to the over-smoothing effect [41]. This may be the reason for increased compatibility for texture features.
Our methodology with full control of the reconstruction parameters had advantages and disadvantages. While limiting the number of scanners gave us the full advantage of image reconstruction algorithm compatibility and standardization, our results at this point are limited to few scanner types (Siemens Definition Flash, Definition AS, AS Plus, Edge). However, this approach also helped us to point out the potential issues in terms of CT image reconstruction. At this point, we believe that it would be even more difficult to show compatibilities among multiple scanners from multiple vendors.
Another limitation was our relatively small sample size (n = 23). This was due to fact that raw images can only be stored in our scanners for only 2 to 4 weeks (depending on the scanner and due to space restrictions). The study was initially designed as a retrospective analysis on nodule measurements, instead of classification of nodules. Under our circumstances, this would not be possible with retrospective analysis, and we would have needed to conduct a prospective study, which could have taken much longer time to complete.
Due to the small sample size, we only looked at nodules under 2~2.5 cm for their solid components, if any nodule had surrounding ground glass tissue, these components were ignored (two of our cases had minor ground glass features surrounding them), and only the solid components were included in the measurements and comparisons.
In conclusion, we found that slice thickness is the main factor impacting reproducibility of the image features we investigated. It is difficult to maintain both volumetric and radiomic measurement reproducibility and reliability simultaneously. However, our findings indicate that at a thickness of approximately 2mm volumetric measurement reproducibility can be maintained. However, especially for reproducibility in radiomic features, image scanning and reconstruction protocols need to remain stable. Standardization of the imaging acquisition parameters would become even more important in larger scale studies, where images are collected from multiple sites. As we have shown here, even with scanners with compatible image reconstruction parameters in a highly controlled environment, it is hard to maintain measurement reproducibility when parameters are arbitrarily changed. With images coming from multiple sites and multiple vendors, if studies are not designed and scanning protocols are not aligned properly; it can be very hard to produce reliable results that can be utilized within clinical studies.