Predicting Future Morphological Changes of Lesions from Radiotracer Uptake in 18F-FDG-PET Images

We introduce a novel computational framework to enable automated identification of texture and shape features of lesions on 18F-FDG-PET images through a graph-based image segmentation method. The proposed framework predicts future morphological changes of lesions with high accuracy. The presented methodology has several benefits over conventional qualitative and semi-quantitative methods, due to its fully quantitative nature and high accuracy in each step of (i) detection, (ii) segmentation, and (iii) feature extraction. To evaluate our proposed computational framework, thirty patients received 2 18F-FDG-PET scans (60 scans total), at two different time points. Metastatic papillary renal cell carcinoma, cerebellar hemongioblastoma, non-small cell lung cancer, neurofibroma, lymphomatoid granulomatosis, lung neoplasm, neuroendocrine tumor, soft tissue thoracic mass, nonnecrotizing granulomatous inflammation, renal cell carcinoma with papillary and cystic features, diffuse large B-cell lymphoma, metastatic alveolar soft part sarcoma, and small cell lung cancer were included in this analysis. The radiotracer accumulation in patients' scans was automatically detected and segmented by the proposed segmentation algorithm. Delineated regions were used to extract shape and textural features, with the proposed adaptive feature extraction framework, as well as standardized uptake values (SUV) of uptake regions, to conduct a broad quantitative analysis. Evaluation of segmentation results indicates that our proposed segmentation algorithm has a mean dice similarity coefficient of 85.75±1.75%. We found that 28 of 68 extracted imaging features were correlated well with SUVmax (p<0.05), and some of the textural features (such as entropy and maximum probability) were superior in predicting morphological changes of radiotracer uptake regions longitudinally, compared to single intensity feature such as SUVmax. We also found that integrating textural features with SUV measurements significantly improves the prediction accuracy of morphological changes (Spearman correlation coefficient = 0.8715, p<2e-16).


Introduction
Positron Emission Tomography (PET) is a non-invasive functional imaging method that captures the distribution of biologically targeted radiotracers at the molecular level, with high sensitivity [1]. Standardized uptake value (SUV) is often used in clinical PET imaging as a semi-quantitative, functional measurement of radiotracer activity, normalized for dose and body weight (or lean body mass or body surface area). Recent investigations have aimed to improve the characterization of radiotracer uptake patterns in order to analyze lesions [2][3][4][5]. These efforts to characterize patterns of uptake are based on the limitation of SUV measurements such as inconsistent cut-off values for discriminating benign and malignant activity, partial volume effects, body composition, and habitus. Note that SUVs are linearly related to image intensities through patient and scanner specific parameters as well as kinetics of the radiotracer. Although parametrically related, different formulations of SUVs (i.e., SUV max , SUV mean , etc) are used to overcome the current limitations of SUV measurements [3], and comprehensive analyses of local to global textural and shape characterization of uptake regions remain unaddressed. Extracting characteristic texture/shape features from uptake regions require robust, accurate, and reliable medical image segmentations; however, primarily due to overlap or close juxtaposition of abnormal signals, with surrounding normal structures, background radiotracer activity, image reconstruction-based artifacts, partial volume effects, low resolution, etc., the PET image segmentation can be a challenging problem. Many studies using segmentation of PET images are performed using manual approaches-fixed-, adaptive-, or iterative-thresholdingand region based methods such as fuzzy c-means (FCM), region growing, or watershed segmentation methods [6][7][8][9][10]. However, all these methods have limitations in clinical practice because of the following restrictions: (i) desired physical accuracy is usually far beyond the outputs of the methods, particularly for small lesions and uptakes with non-spherical shapes, and (ii) robustness and reproducibility of delineations are two unsolved problems in segmentation of uptake regions from PET images because an algorithm working in different signal-to-background ratio condi-tions-with similar performance and outputting the same/similar results consistently-is missing.
Our aims in this study are to explore imaging features that may potentially drive morphological characterization of radiotracer uptake and reliably predict morphological changes of abnormal regions. Our investigation produced a robust, accurate, and efficient image segmentation method, which enables a comprehensive texture analysis possible. The relationship of both textural and shape features to intensity based (i.e., SUV) features were also analyzed using multivariate and Bayesian statistics. In this paper, we present the theoretical analysis of textural characterization and image segmentation methods, in addition to experimentally demonstrating that the proposed texture based features-extracted from accurately delineated radiotracer uptake regions-can potentially be used as semi-quantitative tools in analyzing longitudinal morphological change analysis. The combination of SUV max and the proposed textural features are hypothesized to predict morphological changes of abnormal regions more efficiently. The proposed methods were used to detect and identify lung abnormalities, pertaining to patients who had PET-CT scans and histopathology biopsy. Longitudinal analyses of these patients were used to evaluate the generalizability and consistency of the proposed method. Although changes in uptake or SUVs can be used as a quantitative index for treatment responses, in this study we confine ourselves into only morphological changes and prediction of these changes in image space with the aim of developing a quantitative and reliable computational platform.

Patients and PET-CT Imaging
With IRB approval, we collected 60 18 F-FDG-PET imaging scans from 30 patients. The study population consisted of 12 males and 18 females, with a mean age of 48612.6 for female (range: 35-75, median: 45 years), 44614.5 for male (range: 27-64, median: 47 years), respectively. All the patients presented with either primary non-metastatic, metastatic disease, or a systemic viral infection at the time of the first PET scan. The study group consisted of non-consecutive patients diagnosed with primary lung cancer (NSCLC and SCLC), diffuse large B-cell lymphoma (DLBCL), metastatic papillary renal cell carcinoma, cerebellar hemongioblastoma, neurofibroma, lymphomatoid granulomatosis, lung neoplasm, neuroendocrine tumor, soft tissue thoracic mass, nonnecrotizing granulomatous inflammation, renal cell carcinoma with papillary and cystic features, or metastatic alveolar soft part sarcoma. All 30 patients underwent an 18 F-FDG-PET/CT protocol, where patients were instructed to fast for a minimum of 6-hours before scanning. The serum glucose level was measured to ensure that the value was less than 118 mg/dL (6.5 mmol/l). At the end of the 6 hour period, 321.9-395.9 MBq (8.7-10.7 mCi, median 10.2 mCi) of 18 F-FDG was administered intravenously to the patients, followed by a 45-60 minute uptake period, before image acquisition (mean uptake period = 54.5 mins, minimum uptake period = 45 mins, maximum uptake period = 60 mins). For the analysis of longitudinal studies, the deviation of 18 F-FDG uptake periods between the baseline and follow-up scans must be within +/2 10 minutes [11], and our study had a mean deviation of less than one minute. The 18 F-FDG uptake period deviation between the baseline and follow-up scans was as follows: 22 patients less than 1 min, 7 patients approximately 2 mins, and only one patient had a difference of 4 mins; hence, no significant differences were observed in uptake times between baseline and follow-up scans. Moreover, mean variation of administrated 18 F-FDG (over all patients) between baseline and follow-up scans was measured as 1.05 mCi. PET images were acquired with 2-3 minutes of emission scan per bed for 5-6 bed positions with 3D acquisition mode. Corresponding non-diagnostic low dose CT was obtained for attenuation correction and anatomic localization. PET-CT Images were collected in two different time points (baseline and follow-up; mean time interval between scans was 267 days, median: 206 days, ranging from 64 to 719 days with multiple scans). The images were 1506150 pixels resolution, corresponding to 4 mm64 mm pixel size and 4 mm slice spacing. Each patient's baseline and follow-up scan was carefully analyzed, and during the computational and SUV based analysis, up to five lesions were taken into account and tracked longitudinally (Table 1). Since not all patients were having multiple lesions, in order to avoid any bias towards small/big size or regular/irregular shaped lesions, we tracked as many lesions as possible from patients for longitudinal quantification. Follow-up scans of patients were obtained immediately after five chemotherapy cycles to be consistent in the evaluations, and we used the response evaluation criteria in solid tumors (RECIST) since it suggested the use of five lesions per organ (up to maximum 10 lesions) for analysis. Note also that, patients having secondary severe symptoms and complications such as kidney failure during these five cycles were not included in the selection procedure and hence in the study.

Analysis of Uptake Regions Using Textural and Shape Features
Texture analysis provides quantitative information describing properties in images such as coarseness and smoothness. The search for useful textural features and discriminative statistics in image processing field has significantly progressed throughout the last three decades [12]. Co-occurrence matrices [13], run-length statistics [14], local shapes [15], and cliques in Markov random fields [16], as well as many extensions of these landmark features are well-established in various disciplines. Parallel to these developments, in recent publications, textural and shape features of uptake regions were used to characterize esophageal cancer in [3], human sarcomas in [15], cervix, head and neck cancers in [4]. In particular, local tissue characteristics provided by PET and modeled by textural heterogeneity-by computer algorithmswere explored to understand the biological function of different tissues. However, in practice, the aforementioned computational methods used for analyzing functional uptake in PET images do not provide a general way for reliable inference, due to the highly possible segmentation errors and difficulties in characterization of global and local features. Note also that inaccurate delineation of uptake regions may cause to considerable changes in extracted features. Last but not least, local variations of feature values were usually ignored or not taken into account in such studies [5,13,14,[17][18][19]. However, we postulate that local variations of feature values might be more effective than the features themselves in terms of correlation levels. In this study, we addressed all of these problems in two steps: (i) by proposing a robust, accurate, and fast segmentation method, as described in next subsection, and (ii) by broadly and deeply analyzing different textural and shape features, as well as their local deviations, from accurately delineated regions. Figure 1 shows feature types and associated features, extracted from delineated uptake regions from PET images. Brief descriptions of the features are explained in the following subsections.

Automated Random Walk (ARW) Image Segmentation
When images are low resolution and include noise, graph based segmentation algorithms were shown to be more useful than boundary and thresholding based segmentation methods [20][21][22][23].
PET images, as a nature of the reconstruction process, are low resolution images with high contrast and include noise; therefore, graph based segmentation algorithms are more suited for radiotracer uptake segmentation. We used an adaptive graph theoretic segmentation algorithm-automated random walk (ARW) image segmentation-in order to produce automated, efficient, and reproducible object delineation results from PET images. ARW works as follows: first, object and background are roughly identified by using an automated interesting uptake region (IUR) algorithm, and then some voxels are labeled as either object or background region, accordingly. Second, the delineation algorithm is initiated to efficiently and quickly determine the label of the remaining unlabeled voxels. The proposed ARW determines the highest probabilities for assigning labels to voxels, by measuring the ''betweenness/togetherness,'' by initiating random walkers from a labeled voxel, and by reaching to the unlabeled voxel first by a random walker. The proposed method is different from the conventional random walk algorithm [24] in the following ways: (i) the proposed method is fully automated since it detects interesting uptake regions (IUR) automatically, and (ii) the proposed algorithm is performed based on the SUVs of voxels, and prior probability distributions of voxel SUVs were calculated using a robust kernel density estimation method [25] instead of using simple Gaussian assumptions. For (i), we automatically localized the seeds for object and background separation, based on the high contrast difference of PET images. We accomplished this identification step by defining an encoder function c (see equation 1.1), which is a threshold interval for PET images: where N[R and Nw1. Regions identified by the encoding function were considered as IURs. Once IURs were identified for each IUR, the voxels with the SUV max of that particular IUR were marked as foreground seeds (i.e., SUV IUR max ). Then, we explored its neighborhood through 8-connectivity graph labeling algorithm [26] to find voxels with values less than and equal to the SUV max / N, where N is pre-defined value greater than 1. Those voxels were marked as background seeds. Once foreground and background seeds were localized (i.e., automatic detection step), random walk image segmentation was initiated by these inputs. In all experiments, N was set to 2.5 as equal to the conventional clinical usage (i.e., 40% of SUV max is usually selected as thresholding value) [2]. For (ii), instead of using the pure intensity values of voxels, we adapted SUVs of voxels in ARW algorithm. In addition, during the computation of prior probability distributions of labeled (i.e., localized seeds) voxels, we used an adaptive kernel estimation method [25] to accurately compute the priors even though the number of labeled voxels were small. In the proposed detection approach, it is important to emphasize that the foreground seeds are localized based on the highest intensity values (i.e., SUV max ), whereas background seeds are localized with respect to the foreground seeds through a search algorithm. Since random walk segmentation only needs a few cues for foreground and background, and it is quite robust to a leaking problemcommonly seen in graph cut algorithms-a ''rough'' identification of the parameter N is sufficient to finalize the seeding process. Note also that segmentation as a whole can be considered as consisting of two related tasks: recognition and delineation. Recognition is the process of determining roughly ''where'' the objects are, and it distinguishes them from other object-like entities in the image, while delineation is the final step for defining the spatial extent of the object region/boundary in the image. This recognition task coincides well in our detection algorithm, which roughly identifies IURs and feeds this information to random walk delineation to make it fully automated. Additional information and experimental validations on automatic detection of IURs can be found in Appendix S1.
Random Walks for Image Segmentation. Lets represent an image as a weighted undirected graph (G~(V ,E), v[V and e[E), with its nodes/vertices (v i ) as voxels and edges (e j ), defined as voxel adjacency with cost values assigned to edges (w ij ). We used the un-normalized Gaussian weighting function to define edge weights as w ij~e xp({(g i {g j ) 2 ), where g i represents the SUV of voxel i. By the convention of detected IURs, some of the vertices of the graph were known (denoted by V M ), and some were not known (denoted by V U ), such that V M |V U~V and V M \V U~1 . The segmentation problem was reduced to finding labels of unlabeled vertices (nodes). A combinatorial formulation of this situation could be written as a Dirichlet integral as: where C was the diagonal matrix with the weights of each edge along the diagonal, and A and L( = A T CA) were incidence and Laplacian matrices indicating combinatorial gradients, and defined as The solution of the combinatorial Dirichlet problem may be determined by finding the critical points of the systems. Differentiating D[x] with respect to x and solving the system of linear equations with |V U | unknowns yielded a set of labels for unlabeled vertices. All of these statistics can be derived from the histogram of voxel intensities in the images. Further characterization of the data variability can also be handled by incorporating higher-order statistics into the histogram analysis. For example, some of the histogram based features such as skewness, kurtosis, median absolute deviation (MAD), and interquartile range (IRQ) provide a natural bridge between images and a probabilistic description; however, estimation of a density profile from experimental data points is a challenging issue, especially because the number of data points is limited. Considering the studies that used histogrambased features for textural characterization of radiotracer uptake regions, accurate estimation of histogram features is often not possible. Therefore, we derived a histogram based on global features of textural regions, through kernel density estimation with diffusion approach [25]. This approach is an accurate and reliable non-parametric method, and it is able to deal with a small number of data points. Another important contribution that we have made was to capture local variations of global features. Since it is a wellknown fact that a region in an image has a constant texture-if a set of local statistics or other local properties of the picture function are constant-slowly varying, or approximately periodic [19], it was thus of interest to provide global statistics in a local sense in order to discriminate and characterize textures of region of interest. To achieve this, we extracted descriptive statistics and histogram based features from local patches (see Figure 2e), which we obtained after dividing automatically delineated radiotracer uptake regions into certain size non-overlapping blocks (i.e., 363, 565, 767, 969, and 11611 pixels size blocks were used, and the best block size was found to be 767 pixels and non-overlapping). We extracted all features from 2D sections of segmented 3D objects slice-by-slice and concatenated them (i.e., pseudo-3D) in a feature extraction order to avoid an additional slice sampling load and possible partial volume effects. The best window size was selected based on the highest value of the summation of mutual information (i.e., maximum mutual information: MMI) values over all local windows. Thus, we extracted global features in a local sense, and we computed the variations of these features over all the local regions (i.e., we had additional feature sets derived by computing standard deviation of computed global features such as SD of average intensities, SD of MAD, SD of IRQ, SD of kurtosis, etc.).

Gray Level Co-occurrence Matrix (GLCM) Based
Features. Descriptive statistics and histogram features depend on individual voxel values and not on the interaction or cooccurrence of neighboring voxel values; therefore, they suffer from the inability to encode spatial image variation. Since GLCM based features, in this sense, are second order statistics-estimating the spatial distribution of gray levels-GLCM based feature extraction methods have become one of the most well-known and widely used textural feature extraction methods for various different aims [13]. Some of the GLCM features used in our system included entropy, correlation, contrast, etc. The full list of features are listed in Figure 1. Entropy feature, for example, measures the amount of uncertainty (disorder) in the image. On the other hand, the maximum probability feature (MAX.PR) measures summation of the likelihood of voxels having the most common value for a given region. GLCM features help extract complex image properties by considering spatial variations of voxels pertaining to particular regions of interest. However, in most of the literature about radiotracer uptake characterization, not only the local deviations of these features were ignored, but also the optimal window size for extracting local and global features was not investigated. To tackle this problem, we divided the uptake regions into local regions, as explained in the previous subsection, and then we found the best window size for local and global analysis, by conducting correlation analysis of local regions inside the uptake regions (i.e., for different pre-defined window sizes, highest correlation value obtained among local regions was used to select the best window size). We then incorporated the local standard deviations of the extracted features into our proposed system for further characterization of the uptake regions. In the results section, we demonstrated that some of the textural features have lower correlations with SUV max than their variations. Run-Length Features. Run-length method is an effective texture analysis approach which examines the coarseness of a texture in a specific direction (i.e., number of runs with voxels of a particular gray level) [18]. Various texture features are derived from this information such as short run emphasis (SRE) or long run emphasis (LRE), etc. Basically, run length features are determined for the segmented image regions by taking into consideration the heterogeneity of these regions. The statistical properties of the run of a particular gray level in an image are significantly influenced by the size of the segmented regions; therefore, unlike the other studies reported in the literature [2,3,12,18], we adaptively selected the window size for analysis of the runs by examining the highest autocorrelation between different size of the local windows and probability distribution of each gray level's run-length feature. Figure 1 shows the complete list of run-length features used in our broad analysis. Note that it has been shown here and in the literature [18] that run length features possess as much discriminatory information as conventional texture features such as GLCM features. Please see [4,18] for technical details and further explanations on run-length features.
Gaussian Markov Random Field (GMRF) Features. Most medical images are Markov Random Field (MRF) images, that is, the statistics of a voxel in the medical image that are related to the statistics of voxels in its neighborhood [27,28]. A challenging problem in extracting suitable features from images is to extract robust features that are invariant to rotation and scaling. For instance, although multiple tumors with the same pathological findings may have different size and location within the image, extracted textural features are desired to have similar values that are independent of their size and location if characterization by texture is aimed. MRF, in this case, may offer a solution to this problem by providing a powerful tool to model the probability of spatial interactions in an image. By incorporating Gaussianity assumption to the MRF framework, we were able to extract rotation and scale invariant textural features from segmented uptake regions [17]. GMRF model was defined by the following equation [17,27]: where the equation denotes the probability of a voxel (x,y) having a specific gray value I xy given the values of its neighbors, n (n = 6 in this particular study) is the total number of pixels in the neighborhood Z xy and S xy;z denotes the summation of two symmetric pixels. We estimated the GMRF parameters (i.e., a 1 ,a 2 ,:::,a 6 and s 2 ) by using a least square error estimation method, similar to the study in [17]. Shape Features. The local relationship between fuzzy/solid objects and the intensity distributions-pertaining to those objects-is obtained through shape (geometric) features. There have been some techniques explored in [5,15] for evaluation of 18 F-FDG-PET utilization characteristics in human sarcomas such that a measure of heterogeneity incorporating tumor shape information was shown to be superior, compared to a measure of heterogeneity alone. Similarly, we encoded the 2D/3D boundaries of segmented regions and computed the ''circularity/ sphericity'' of those regions, as well as fractal geometry and volume information of those regions. These features were extracted from the 3D segmented radiotracer uptake regions. Extracted features were used to explore the correlation between functional information and the anatomical boundary of functional uptake. While volume (V) was computed by multiplying the voxel size with the number of voxels occupied in the uptake region, circularity (or sphericity in other words) is calculated as p denotes the surface area of the segmented region (i.e., voxels interior to the segmented objects are not counted in surface area computation). Sphericity measures disparity between the shape of an object and a perfect sphere (i.e., roundness). In addition, we also extracted fractal geometry of 3D segmented regions, where the fractal was defined as an object with the self-similarity property, i.e., it appears the same at different magnifications. Fractal measures are frequently used to understand underlying phenomena in different biomedical applications, including the cancer diagnosis. It provides information on the regularity and complexity of an object by quantifying its self-similarity level. We measured the fractal properties of the segmented objects by the box-counting method, as described previously in [17,29].

Evaluation of Segmentations
The dice similarity coefficient (DSC) [30] and Hausdorff distance (HD) [31] were used to evaluate segmentation accuracy, with respect to ground truth (i.e., surragate truth) provided by expert's manual delineations. Note that we use the term ground truth and surrogate truth interchangeably. Also, since our analysis includes only PET images, lesion volume should be regarded as functional volume only (functional volume is not necessarily equivalent to the tumor volume). Indeed, true tumor volume can only be validated with histopathology. While DSC is a measurement of spatial overlap (in percentage) between segmented object (lesion) and surragate truth (manually delineated lesion by experts), HD is a shape dissimilarity metric measuring the most mismatched boundary points between the segmented object and ground truth. High DSC and low HD values indicate goodness of the image segmentation method. Furthermore, we also analyzed inter-and intra-observer variations by DSC overlap ratios, since simple Pearson correlations can be misleading [32] (i.e., segmented volumes may have the same values although volumes may not overlap or overlap very little). Two expert radiologists delineated radiotracer uptake regions in three different time points (one week between each drawings, and blinded to each other's drawings). Each expert's drawings-in different time points-were used to compute intra-observer agreement ratios. Table 2 summarizes the evaluation of segmentation results for the proposed method compared to mean and individual delineation definitions of experts, as well as inter-and intra-observer agreements. Evaluation metrics (DSC and HD) are formulated and described in detail in the Appendix S1.

Exploring Connections among Extracted Features
We integrated all extracted texture, shape, and SUV max features into an unsupervised hierarchical clustering algorithm [33]. Our aim was to explore similarities and dissimilarities of features and to clarify hidden connections among features that can be integrated together in order to more accurately predict radiotracer uptake region morphological properties (without conducting any claim about clinical utility of these features) such as change in volume and shape (i.e., morphological characterization). The presence of clusters in a data set is frequently due to the existence of certain relationships between measured variables. Moreover, true group (class) membership is unknown to these variables. We conducted unsupervised clustering of the measured variables in order to explore true (or surrogate true) memberships. Euclidean distance dissimilarity measure with complete leakage method [34] were used to find highly correlated features and to contain them in similar clusters. Figure 5 demonstrates correlation analysis of all feature sets through a correlation matrix (whose column and rows shows the features), and dendogram graphics (i.e., hierarchical tree structures) for each feature were integrated into columns and rows of the correlation matrix. Similarly, we extracted the hierarchical tree structures only for features that have statistically significant correlation values with SUV max values. The resulting clustering scheme is illustrated in Figure 6. We repeated the same step for each type of feature separately (run-length, GLCM, etc., as shown in Figure 5 and 6) to clarify if the features were coming from significantly different class (membership) or not (from R = 21 (white) to R = 1 (dark blue), Figure 6). In particular, since SUV max is the current standard in quantification of uptake regions, we computed the Pearson correlations of all features with SUV max, and we reported only significantly correlating features in Table 3; however, one may introduce different quantification features to repeat this task. Among all features having significant correlations with SUV max , it is interesting that none of the features share the . An example surface pair obtained from segmented uptake regions (i.e., non-specific mass from lung regions of a particular patient) is shown. We parameterize the surface (a) of lesion using Euler angles of boundary points, and we colorized the surface points with respect to those angles in radians (b). This shape information (i.e., circularity) was used in longitudinal assessment of uptake changes. doi:10.1371/journal.pone.0057105.g004 Table 2. Evaluation of the proposed segmentation methods (via DSC and HD) and observer agreement ratios are given. same cluster that SUV max occupies, that is, those features are found to be informative in a semi-quantitative sense like SUV max itself. Another potentially important finding, observed in Table 3 and Figure 5, was that the standard deviation (SD) of some features-most of the GLCM features and some of the run-length features (i.e., LGRE.SD, GLN.SD, HGRE.SD, etc.)-were outperforming the features in correlation measurements. Note that some SD based features are coming from the local approach that we follow in the feature extraction. In Table 3, compared to the local approach, we also demonstrated the performance of global approach for textural analysis where, features were extracted from the segmented regions, without taking into account the local variations of the features within the scene. It is evident with this finding that not only do global heterogeneity of spatial features provide better associations among features, but local heterogeneity (SD) of both spatial and shape features also provide better correlations, as agreed with our initial assumption.

Correlation of Textural, Shape, and SUV max Features and Impact on Morphological Change Predictability
In longitudinal measurements of uptake regions, we tested the prediction power of each extracted texture feature for estimating morphological changes, including volume and circularity. Since morphological changes such as volume and shape may represent disease severity [5,15], our proposed technique may also be used in clinical tasks for predicting those morphologic factors using texture features combined with SUV. However, this requires a large spectrum of clinical data as well as ground truth from biopsy samples. In addition, we believe that associations of the image based features should be revealed before testing the proposed methodlogy for clinically more involved tasks. Therefore, we confine ourselves in this section to evaluate image based features and their prediction power analysis to build (near-) optimal associations among image features. We used shape features (i.e., circularity in particular) as our ground truth to test individual image features, without entirely relying on SUV max . In addition, we also added the feature ''change in volume of radiotracer uptake regions'' into our analysis to explore if there was a correlation with suggested informative features. Table 4 reports the results of an analysis in which textural and SUV max features were jointly and individually considered for possible relation to shape and volume changes longitudinally. We concluded from the results from Table 4 that volume change information does not have significant correlation with SUV max ; however, textural features correlate well with volume change information. Furthermore, combined SUVmax and textural features lead to an increase in the correlation ratios, compared to textural features or SUV max alone. Textural features showed superior correlation ratios to SUV max in all cases. Figure 7 shows histograms, pair-wise Spearman correlations and box-plots of selected five best features having the highest predictability values in patient outcome or changes in uptake region characteristics (i.e., SUV max , SD of contrast shade (CSHADE.SD), entropy, maximum probability (Max.PR), and SRE). For multiple variable selection and to use them in morphological change prediction, a simple logit transformation [32] was used so that parameters of the logit transformation were obtained through maximum likelihood estimation method. In order to validate both parameters of logit regression and prediction ability of the combined model, we used a leave one out cross validation (LOOCV) sampling technique. Circularity and volume, on the other hand, were combined through a simple multiplication operation, where lesions having the same volume/ circularity information were differed from each other with circularity/ volume information, respectively. In addition, we found that there is no significant volume and circularity differences between ground truth and segmentated lesions as indicated by DSC rates in Table 2. As earlier mentioned, volume correlation of ground truth and segmented sets are only meaningful when they are presented with corresponding DSC rates. Since DSC rates are given in Table 2, then we conducted a t-test and Pearson correlation test to find the correlation between volume and circularity measurements of ground truth and computer based calculation. A high Pearson correlation value of R = 0.971, p,0.001 and R = 0.955, p,0.001, for volume and circularity measurements was obtained. Finally, we also determined that the likelihood of these selected features follow normal distribution by using the Shapiro-Wilk test [35]. As a result of this test, entropy and Max.PR features were found to follow a normal distribution, and the rest did not, as summarized in Table 5. To show that Max.PR and entropy follows normal distributions but have significantly different variation, we conducted an F-test [36] between Max.PR and entropy features (F = 0.3042, 95% Confidence interval = [0.1855 0.4987], p = 3.98e-6). Note that conclusions about the utility of features were arrived after LOOCV was conducted for all data as usual in supervised machine learning techniques. Once the conclusions were derived with the help of proposed method, for any unseen baseline features, follow-up morphological changes can be predicted.

Discussion
18 F-FDG-PET imaging demonstrates increased metabolism with high contrast, but localization of the radiotracer uptake is limited by low spatial resolution of PET images. Even though the high contrast between tumor and normal tissue on PET images could diminish the variability in tumor regions, observer variability in delineation of tumor is still high with the qualitative use of PET images. Changing the window level could significantly alter the apparent volume and tumor shape; therefore, the qualitative definition of the target volume and well-defined tumor boundaries using PET images is not straightforward and highly dependent on the image interpreter.  Most of the automated abnormal radiotracer uptake delineation methods-in the tissues-have relied exclusively on thresholding an absolute PET intensity value. Both inconsistency in radiotracer uptake among patients and variability of radiotracer uptake in normal and abnormal tissue, within individual patients, influence the performance of these automated methods. Furthermore, these thresholding methods also disregard the ''texture'' information obtainable on PET images.
One important aspect of our work is that all of the segmentation, feature extraction, and statistical inference methods were based purely on functional PET images. We used this as a hard constraint to maximize the extracted information, and to use this information as a base for the possible incorporation of different information.
Another strong aspect of our feature extraction method is to use adaptive (in discrete sense) window size. A very similar approach for data exploration was recently published by Reshef et al [37]; however, there is also a limitation in this strong adaptive feature extraction method that we proposed here, and similarly in the study of Reshef et al [37]. That is, there is no guaranteed optimal window size for feature selection procedure due to resolution limitation and discreet (and therefore fuzzy) nature of the images-since we are only able to give near-optimal solution for window size selection. This is important due to extracting statistically accurate features from uptake regions, and also because of the finding that variability among local windows often carries more valuable information than the extracted feature themselves. This issue is totally new and subject to further investigations under different conditions; for example, different imaging modalities (i.e., MRI and CT) and a variation of SUV measurements.
In this work, we have not discussed the incorporation of anatomical information into the PET functional domain, but rather presented the broad analysis of morphological characterization, both in shape and in spatial space. As an extension of this study, we aim to adapt our feature extraction method to a subject and modality specific framework, where feature extraction methods optimally find the subject specific functional and anatomical information from abnormal regions (i.e., from CT or MRI scans) and corresponding uptake regions (i.e., from PET scans) simultaneously.

Conclusions
We presented a framework where we automatically segmented radiotracer uptake regions in high accuracy. Our findings show that extracted shape and texture features, as well as SUV measurements from segmented regions, provide broad analysis of morphological characterization of functional information. Our approach produced a unique estimation of morphological features that can be used alone or together with SUV measurements to predict longitudinal changes in volume and shape of uptake regions. We concluded from our experimental results that some of the textural features such as entropy, maximum probability and contrast shading information of local spatial regions, short run emphasis, and variability of these features over different local windows potentially carry the most valuable, and their predictability of morphological change in uptake regions' shape and volume were reported as superior to single intensity based measurements. Integrating the extracted features with SUV measurements may improve our ability to understand the morphological changes of uptake regions over time. We also highlighted how the accurate segmentation expanded our understanding of shape information-extracted from uptake regions-and how well it agreed with the results of landmark studies [5,15].