Feature Learning Based Random Walk for Liver Segmentation

Liver segmentation is a significant processing technique for computer-assisted diagnosis. This method has attracted considerable attention and achieved effective result. However, liver segmentation using computed tomography (CT) images remains a challenging task because of the low contrast between the liver and adjacent organs. This paper proposes a feature-learning-based random walk method for liver segmentation using CT images. Four texture features were extracted and then classified to determine the classification probability corresponding to the test images. Seed points on the original test image were automatically selected and further used in the random walk (RW) algorithm to achieve comparable results to previous segmentation methods.


Introduction
The liver, which secretes bile, is the largest digestive gland and detoxification organ in the body. This organ frequently suffers from lesions because of its numerous functions. According to the World Health Organization, 745 thousand people have died because of liver cancer last 2015 [1]. Thus, prevention and treatment of liver disease is urgent and has become a hot topic for related research worldwide. Computed tomography (CT) imaging provides accurate anatomic structural information of the liver and its lesions [2,3]. CT images with high signal-tonoise ratio and high spatial resolution have become an important imaging modality and basis for the diagnosis and treatment of liver diseases.
Segmentation technique extracts the structure of the liver and constructs the geometrical expression of the liver shape. This method is indispensable for volume measurement, functional assessment, lesion location and operation planning [4]. The shape and size of the liver remarkably differ among individuals. Manual extraction of liver structure continues to be the primary procedure applied by clinicians, but this process is time-consuming and relies on subjective judgment. Numerous segmentation methods have been developed. The detailed overviews for liver segmentation are referred to recent and extensive reviews [5][6][7]. However, precise segmentation of livers remains the most challenging task in medical image processing because of the sharp corners, concave regions, and similar intensities with other organs.
Chi et al. [8] proposed an improved active contour model for liver segmentation in which template matching, K-means clustering, and snake model are combined to constrain the deformation of active contour to approach the boundary of an object. Rikxoort introduced the registration technology [9] to constrain the deformation of the model. KNN classifier was used to roughly segment the liver from the background, and accurate segmentation was achieved by B-splines [10]. Seghers used a priori knowledge of the liver boundary [11] to construct the external force constraint model. This model enhanced the local segmentation accuracy of the liver to a certain extent. However, approaching hepatic depressions and sharp corners causes the difficulty in maintaining the smoothness of livers. The validity and robustness of internal/ external constraint model should be improved in the above deformation model methods.
Hufnagel et al. proposed a statistic shape model for liver segmentation [12]. Principal component analysis (PCA) was used to represent the shape of livers. Features manually extracted from the test liver were matched with those from training livers. The statistical shape model was iteratively deformed in 3D space to obtain the final segmented result. Meanwhile, Saddi used the Gauss mixture model to initialize the statistical shape model [13]. Gradient descent method was used to minimize the energy function of the level set after the initial boundary of level set was manually obtained. Afterward, the statistical shape model and level set shape were registered to decide the boundary of the liver. Kainmüller et al. built a shape constraint plane using the position relation of statistical model vertices and their neighborhood points to determine the scale and range of deformation to remedy over-segmentation. However, optimal setting of the initial contour should be improved since exact matching between prior shape models is difficult. Moreover, personalized information was not considered in building the energy of a statistical shape model, which is an indispensable factor for precise segmentation.
Furukawa et al. [14] proposed a segmentation method by comparing the maximum a posteriori estimation and level set. Probabilistic atlas was built to constrain the energy function of the level set which was driven to approach the boundary of the liver. Slagmolen et al. [15] proposed a method based on interactive probabilistic atlas registration. The average model was obtained after all training images were registered with manually selected mark points. Meanwhile, the test image was roughly segmented by the threshold method. Final segmentation result was achieved by deforming the average model after the non-rigid registration between the average model and rough segmentation. Although a relatively accurate segmentation result was obtained, probabilistic atlas is subjective and time consuming.
A lot of fully-automatic methods has been provided for liver segmentation. Based on inconsistent contrast-enhancement and spurious imaging artifacts, Marius [16] proposed an affine invariant shape parameterization for liver segmentation and refined with a geodesic active contour. Huang [17] proposed a hybrid approach by combining liver intensity range detection, atlas-based affine, non-rigid registration and shape constrained differeomorphic demons. Based on the level set framework, Wimmer [18] used boundary model, region model and shape model to avoid a parameterization of the target shape. Kainmuller [19] combined the statistical deformable model with a constrained free-form to computing the displacements and initial positioning of the model.
In this paper, a feature-learning-based random walk method (FLRW) is presented for liver segmentation using CT images. Four kinds of texture features were extracted and fused to train a hybrid classifier that used the SVMs as weak classifiers for Adaboost. A liver-specific probabilistic image for each unlabeled image was generated. Final segmentation was obtained based on a random walk combined with the generated probabilistic images. The main contributions of this work are summarized as follows: (1) a probabilistic image in order to capture the relationship between pixels is as a pre-segmentation result; (2) the texture feature rather than only intensity are used to improve the weights which were significant to the random walk for segmentation; (3) the segmentation framework is appropriate for two different databases even a few training images from one database are used.
The rest of this paper is organized as follows. After the introduction, the methodology, including probabilistic image learning and liver boundary determination, is described in Section 2. Evaluation results are presented in Section 3. Discussion and conclusions are presented in Section 4.

Methodology
The proposed method consists of a learning-based probabilistic imaging step and random walk-based boundary determination step. All the images, namely, training and test images, are denoised before feature extraction of each pixel. A hybrid classifier is used after feature fusion to obtain the probability of the test image, which indicates the likelihood of the pixel to belong to the liver. Automatic random walk-based refinement is applied to achieve the final segmentation result. The flowchart of the proposed method is shown in Fig 1.

Learning based probabilistic image
Given a target image T 2 R H×W , the liver segmentation problem is formulated by assigning each pixel x 2 T a label l 2 {0, 1} with liver label l = 1 and background label l = 0. Here, the probability of each pixel belonging to a liver p(l = 1|x) is estimated with a feature-learningbased method.

Pixel representation.
Pixel representation is achieved by applying window-based feature extraction. The values of a larger spatial neighborhood (feature window P) with size of h×w are used to describe the center pixel x. The texture features achieve satisfying results for medical images in various tasks. In this study, four representative texture features, namely, local binary pattern (LBP) [20,21], gray level co-occurrence matrix (GLCM) [22], Haar [23], and histogram of oriented gradient (HOG) [24], were selected.
(A) LBP LBP is the non-parametric operator that describes the local spatial structure of images and is invariant to illumination change with fast calculation. The value of the center pixel x is used as a threshold and compared with spatial neighborhoods to obtain a binary code for texture feature description. After defining a neighborhood radius r, N pixels of radius r around x are processed to construct the texture feature. Given the intensity P x of the center pixel x and P n (n = 1, 2, . . ., N) of spatial neighborhoods, we obtain a binary pattern by comparing P n with P x clockwise or counter-clockwise. Each digit of the binary pattern is expressed as LBP is achieved by converting the binary pattern to a decimal number: The LBP texture feature for x is shown as follows where x j ,j 2[1,h×w] is the spatial neighborhood of x in the feature window.
(B) GLCM The GLCM consists of the distance and angle between different pixels and is used to extract second-order statistical texture features. Comprehensive data, namely, direction, distance, variation range, and speed, are expressed by the relativity between two intensities with a certain distance and direction. In the feature window, GLCM calculates the probability p(a,b|d,θ) that the intensity value a occurs with other intensity value b in a specific spatial distance d and direction θ. Levels N (a,b 2 N) in an image determines the size of the GLCM (N × N). A number of GLCMs are produced according to different values of d and θ. Twelve textural features are used for each GLCM measure the characteristics of texture statistics. These features are energy, contrast, correlation, homogeneity, entropy, autocorrelation, dissimilarity, cluster shade, cluster tendency, maximum probability [25], statistics variance, and sum mean [26].
For each x in the center of feature window, the GLCM feature is constructed by the intensity value in the feature window and shown as Eq 4 . . . ; and f d j ;y j 12 are the 12 textural feature statistics from one GLCM with distance d j and direction θ j .
(C) Haar Haar as an appearance feature is used because of its computational efficiency using integral images [27]. Haar consists of four sets of features, namely, edge, line, center-surround and special diagonal line features. All features are obtained by 15 feature filters with white and black rectangles of specific arrangement [28]. Each feature is scalar and obtained by subtracting the sum of pixels under the white rectangle from the sum of pixels under the black rectangle. For each x in the center of the feature window, Haar features are constructed based on the feature filter and intensity value in the feature window and are shows as Eq 5 as follows: where f 1 ,f 2 ,. . ., and f 15 are Haar features with 15 feature filters.
(D) HOG HOG [29] is an effective appearance feature and complementary to Haar features to collaboratively improve performance. Direction density distribution of the gradient or edge in HOG impressively describes the local appearance and shape. Gradient values G and directions θ are calculated in the horizontal G x and vertical G y directions of the entire image: HOG feature is further represented as follows: where j is the location of the pixel in the feature window, and θ k is the k th angular bins.

Feature fusion.
The texture feature is extracted for a pixel by concatenating four features (LBP, GLCM, Haar and HOG) and representing as follows: The high dimensionality of the concatenated features may result in information redundancy and high computational cost. Principle component analysis (PCA) is used by measuring the correlations between elements to select more useful information and improve performance. Assuming that F = [f 1 ,f 2 ,Á Á Á,f N ] denotes N training sample sets and f n ,n = 1,2,. . .,N is the texture feature of x i ,i = 1,2,. . .,N. After the sample sets the centralization " F, the covariance matrix FF T is created to fine the eigenvector matrix U = [u 1 ,u 2 ,Á Á Á,u L ] and eigenvalue matrix . Eigenvectors u d associated with the first D largest eigenvalues are used to form the projection subspace as follows: The fused texture feature is achieved by projecting the original feature into the PCA subspace:

Hybrid classifier for the pixel forecast.
A hybrid classifier is used to forecast the probability of a pixel belonging to the liver [30]. Adaboost [31] creates a collection of SVMs [32] as weak classifiers (h SVM t ; t ¼ 1; 2; . . . ; T) with adaptive weights α t . Therefore, optimized performance is achieved by avoiding the selection of the optimal parameter in the single classifier. For N training sample sets {(y 1 ,l 1 ),(y 2 ,l 2 ),. . .,(y N ,l N )} with l n as the label of each training pixel, the weights of training samples are set to w t n ¼ 1 The probability estimated for the positive class for a two-class problem is calculated as [33]: The probabilistic image is formed by the liver-likelihood estimation of each pixel.

Random walks based liver boundary determination
The probabilistic image provides a prior knowledge without using the relationship of pixels to each other. This process is inappropriate for the direct segmentation of the liver with smooth and continuous boundary. The random walk algorithm [34], which considers the spatial nature of an image, incorporates above prior knowledge to achieve an accurate segmentation result. An adaptive threshold method is adapted in the original image to automatically initialize the seed points required by the random walk. The histograms of each slice of training images are investigated, in which the last peaks in the histograms denote the intensity range of the livers. Fig 2 shows the gray value distribution of one image (upper line) and the liver part (lower line) [35]. Prior knowledge states that the gray level range of the liver is between 125 and 155 [36]. After extracting the peak in this range, the liver is separated from non-liver tissues by determining the thresholds s 1 and s 2 . For a contrast-enhanced CT image, s 1 is 3×13HU less than the peak and s 2 is 2×13HU more than the peak [37]. For a test image, the binary images representing the liver and background are expressed as follows: Erosions are operated to process the binary image g 1 containing liver region and g 2 containing background. The maximum connected regions are found in g 1 . The seed points labeled as liver seed in = 1 and background seed out = 0 are selected in g 1 and g 2 , respectively. seed in and seed out are generated as the seed points of random-walk algorithm, which further determines the liver boundary in the image. The original random-walk algorithm achieves liver segmentation depending only on intensity information and ignores texture feature information. In this paper, the probabilistic image obtained by the texture feature information is combined with the original image to determine the liver boundary. The weight between two neighborhoods is shown in Eq 14: where i and j represent the indices of pixels in both original image T and probabilistic image p; α and β are adjustment parameters. The liver-likelihood estimation for pixel x i is achieved by minimizing the following objective function on the basis of the labeled seed points: arg min where If x i is greater than 1/2, the label of the pixel is 1, i.e., in the liver region. Otherwise, the label of the pixel is 0, which indicated outside the liver.

Experimental Results
To validate the proposed method, we test it on two databases: (1) MICCAI 2007 grand challenge data, and (2) clinical cirrhosis data. Both databases were enhanced with contrast agent and scanned in central venous phase. Transversal directions were acquired for CT scans with segmented livers. For the data from MICCAI 2007 grand challenge, the number of slices in each scan varied between 64 and 394 with 512×512 resolution. Pixel spacing varied between 0.55mm and 0.8 mm, whereas inter-slice distance varied from 1 mm to 3 mm. The clinical cirrhosis data used in this study are provided by Chinese Academy of Medical Sciences and Peking Union Medical College, and the study on these data was approved by the institutional ethical review board. The patients involved in our study provide written consent. The number of slices in each scan varied between 71 and 195 with 512×512 resolution.
Fifteen scans are randomly selected from the MICCAI 2007 grand challenge data. And instead of the whole CT scan, only one slice contains the largest liver in each scan is used as the training data. Thus, fifteen slices obtained from MICCAI 2007 grand challenge are implemented as the training data to segment the livers from four data randomly select in MICCAI 2007 grand challenge database and three data in the clinical cirrhosis database.

Objective evaluation
Different coefficients reflecting how well two segmented livers match are computed to compare the performance of the proposed method. Denoting the gold standard segmented manually as A, and the automatically segmented liver as B, we have [38], Accuracy rate: Volume overlap: Relative volume difference:
Moreover, three evaluation methods, namely, average surface distance (ASD), root mean squared error (RMSE), and maximum surface distance (MSD), were used to compare different segmentation methods according to the pixel surface distance.
where S( Ã ) is the surface voxel of the region Ã , sÃ is one of the voxels on the surface of the region Ã , and dðs Ã 1 ; SðÃ 2 ÞÞ ¼ min 2SðÃ 2 Þ jjs Ã 1 À s Ã 2 jj is the minimum Euclidean distance between corresponding voxels of two data surfaces. Higher ACCs and lower values of other measures indicate better segmentation.

Training sample selection
Information redundancy is produced when all pixels in the training images are used as training sample. This phenomenon is due to the similar texture features in one image. Moreover, the proportion of positive and negative samples may influence the final classification results. There are total fifteen slices for training as mentioned above. For each training slice, 21000 training samples are randomly selected in three areas, namely, inside (7000 samples), outside (7000 samples), and near the edges of livers (7000 samples). Fig 3 shows the selected training samples.

Parameter selection
Parameter β of the original random walk segmentation method decides the smoothness of the segmented contour. Given the optimized parameters, we quantitatively examine the variation trends of objective evaluations with respect to the influences of parameter β. Specifically, we perform the liver segmentation with values of parameter β from 10 to 250. The changes of the VOE, RVD, ASD,

Probability image
For a test image, the probability image is obtained by classifying each pixel into liver or nonliver based on the extracted texture features. Fig 6 shows the probability images for three different slices of the test image. The liver area is evidently displayed with high brightness and relative clear edge. The tumor area especially pointed out by the red arrow exhibited remarkable difference with the normal liver in the probability image compared with that in the original image. The effectiveness of the extracted features is proved visually. On the other hand, the location of the livers are not exactly same for different patients because of the complexity of anatomical structures. Thus, the texture features are extracted from all the pixels in the images and the probability images are built for all the organs. Since the intensities of stomach, heart and subcostal fat of the rib cage are very similar to the liver, the probability images only is inadequate for liver segmentation. All the organs with different probabilities are shown in Fig 6  (b1), 6(b2) and 6(b3). We further increase the robustness of the liver segmentation result using random walk which is improved by combining intensity information and texture feature information.

Comparison on 2D slices of MICCAI data
Segmentation results obtained by the proposed method were compared with those obtained by the original random walk method by using the same seed points.  Table 1 quantitatively analyzes the ACC, VOE, RVD, FNR and FPR between the RW and FLRW. The average ACC of the FLRW was 95.18%, which was higher than that of the RW. The average VOE, RVD and FPR of FLRW were 9.17%, 9.30% and 0.82%, which were much lower than those of the RW. However, fewer liver voxels were classified as non-liver because of inconspicuous features near the liver edge. The FNR was 8.36% by using FLRW and little more than that of the RW.  contours illustrate the overlapping parts. Although the proposed method could not exactly segment the liver edge which has considerable similar intensity with the background, the entire segmentation results still highly conformed with the gold standard. After segmenting all slices in the 3D data, we show the 3D segmentation results in Fig 9 with coronal section, vertical plane, and transverse plane. The segmented edge obtained by the proposed method accurately fitted the liver surface even on the sunken area. This result illustrates the effectiveness of the proposed segmentation method. The 3D rendering results are shown in Fig 10, in which (a1~d1) and (a2~d2) are the gold standard and FLRW results, whereas (a3~d3) are the fusion display with 3D surface distance error maps. Most areas are with small distance error even at the corner region. The area connected with the vessel has a large distance error, which should be further improved in future work.

Segmentation results on 3D data of MICCAI data
Statistical distribution of the surface distance errors for the different text data are shown in Fig 11. Most of the surface distance errors were in the range of 0mm to 3mm. Surface distance  Learning for Liver Segmentation errors of voxels less than 2mm accounted for more than 90% proportion of all voxels. It numerically shows that the FLRW can achieve a relatively accurate segmentation result. Tables 2 and 3 list the quantitative evaluation of segmentation results for four test data in MIC-CAI 2007 grand challenge database and three data in clinical cirrhosis database. Comprehensive evaluation criteria were provided by the MICCAI 2007 grand challenge and are shown in Table 4. The gold standard as reference is used to calibrate the scores of the test data. The performance of gold standard is 75 out of 100 points ( Table 4). The corresponding score for test data is obtained by [7] S ¼ max 100 À 25

Quantitative error analysis for both databases
where ε i is the quantitative evaluation value of the test data, and ε is the evaluation standard values. Table 5 displays the scores of different methods for comparison. The methods of Heimann [39], Saddi [40], van Rikxoort [41], and Gauriau [42] were performed using the MICCAI 2007 grand challenge database as the proposed FLRW and evaluated with same criteria. Comparative results show that the proposed FLRW was superior to three methods in the total score, and same as or better than the fourth method in ASD and RMSE. Moreover, the FLRW achieved more effective segmentation result with the smallest MSD.  For the clinical cirrhosis database, we also give the scores as shown in Table 6. Although the training slices are selected from the MICCAI 2007 grand challenge database, the results are still competitive by comparing with Table 5.

Conclusion
We propose an automatic segmentation method for the liver based on a feature learning and random walk. Four texture features were extracted and fused to present each pixel in an image. The probability image was further calculated for liver enhancement. Improved random walk with automatically selected seeds was used in the probability image to achieve effective segmentation results. The proposed method was compared with other methods using eight different measures, namely, ASD, RMSE, MSD, ACC, VOE, RVD, FNR, and FPR, in MICCAI 2007 grand challenge database and the clinical cirrhosis database. The calibrated scores of the test data were investigated, and results further proved the effectiveness of the proposed segmentation method.