A Rough Set Bounded Spatially Constrained Asymmetric Gaussian Mixture Model for Image Segmentation

Accurate image segmentation is an important issue in image processing, where Gaussian mixture models play an important part and have been proven effective. However, most Gaussian mixture model (GMM) based methods suffer from one or more limitations, such as limited noise robustness, over-smoothness for segmentations, and lack of flexibility to fit data. In order to address these issues, in this paper, we propose a rough set bounded asymmetric Gaussian mixture model with spatial constraint for image segmentation. First, based on our previous work where each cluster is characterized by three automatically determined rough-fuzzy regions, we partition the target image into three rough regions with two adaptively computed thresholds. Second, a new bounded indicator function is proposed to determine the bounded support regions of the observed data. The bounded indicator and posterior probability of a pixel that belongs to each sub-region is estimated with respect to the rough region where the pixel lies. Third, to further reduce over-smoothness for segmentations, two novel prior factors are proposed that incorporate the spatial information among neighborhood pixels, which are constructed based on the prior and posterior probabilities of the within- and between-clusters, and considers the spatial direction. We compare our algorithm to state-of-the-art segmentation approaches in both synthetic and real images to demonstrate the superior performance of the proposed algorithm.


Introduction
As one of the classical problems in image processing, image segmentation has been extensively studied, which can be treated as a classification problem [1][2][3][4][5] for the target image. Various image segmentation algorithms have been developed such as active contour models [6,7], graph based methods [8,9] and clustering techniques [10][11][12]. Over the last decades, modelbased techniques [13,14] have been widely used in image segmentation, where the standard Gaussian mixture model (GMM) [15,16] is a well-known method because of its simplicity and ease of implementation [17]. The parameters involved in GMM can be efficiently estimated by expectation maximization (EM) algorithm [18]. However, the standard GMM still suffers from the following limitations: sensitivity to noise, less flexibility to fit the shape of the data and unbounded distributions [19].
In order to reduce noise sensitivity for segmentation, (hidden) Markov random fields ((H) MRF) based mixture models have been widely utilized for pixel labels [20][21][22], where (H)MRF is acted on neighboring labels, and the clustering result for each pixel depends on the neighboring pixels [23]. On the other hand, in order to impose spatial constraints among neighboring pixels, another group of mixture models with MRF has been proposed by modeling the joint distribution of the priors for each pixel [24][25][26][27]. For example, in [26], Displaros et al. proposed a generative GMM model by introducing a pseudo-likelihood quantity to incorporate the spatial smoothness constraints based on Kullback-Leibler (KL) divergence. In [27], Nikou et al. proposed a novel spatial constraint that can adaptively select spatial directions. In order to directly apply the EM algorithm to estimate the involved parameters, Nguyen and Wu [23] proposed a robust spatially constrained GMM by introducing a spatial factor into the prior distribution. Although the forementioned algorithms can reduce the impact of noise in the image, most (H)MRF based algorithms are still not sufficiently robust with respect to different noise types and levels.
Because of the utilization of Gaussian distribution in GMM, the distribution tail is often shorter for many applied problems [17], which means that the Gaussian distribution is not sufficiently flexible to fit data [19]. In order to improve the flexibility for the data fitness, the Student's-t distribution, Laplace distribution and generalized Gaussian distribution are used to replace the Gaussian distribution in mixture model. Therefore, the Student's-t mixture model (SMM) [28,29], Laplace mixture model (LMM) [30,31], and generalized Gaussian mixture model (GGMM) [32,33] have been proposed. On the other hand, using only one distribution for each component in the mixture model is not sufficiently satisfactory for many practical applications. Therefore, another solution for fitting data with different shapes is using multiple distributions for each component. For example, Zhang et al. [34] proposed a modified GMM by incorporating local spatial and intensity information. The conditional probability for each pixel is constructed based on the probabilities of neighboring pixels [34]. Browne et al. [35] proposed a mixture of mixture model by combining a multivariate Gaussian distribution and a multivariate uniform distribution together to model the component density [35]. Nguyen et al. [19] proposed an asymmetric mixture model by modeling the component with multivariate Gaussian distributions [19].
Moreover, the distributions in most mixture models are unbounded with a supporting range of (−1, +1), which is not consistent with the practical application where the practical data generally fall in a bounded region [19]. In [36], a bounded GMM (BGMM) was proposed for speech processing. In [37], a bounded generalized GMM was proposed that included GMM, LMM, GGMM, and BGMM as special cases. Nguyen et al. [17,19,38] proposed various bounded mixture models to fit different data shapes. However, the above mentioned approaches still suffer from the following limitations: (1) Without considering any spatial information, the mixture of mixture model [35] and the bounded asymmetric mixture model (BAMM) [19] are still sensitive to noise, although both of these types of models are more flexible.
(2) For the bounded mixture models [19,[36][37][38], the bounded support regions of observed data should be predefined. Moreover, the indicator function of the bounded support region is a binary function that cannot easily manage uncertainty, vagueness, and incompleteness in data.
Motivated by the aforementioned observations, in this paper, we propose a rough set bounded asymmetric Gaussian mixture model with spatial constraint for image segmentation. First, in our previous work [39], based on the rough set theory [40], we proposed a generalized rough fuzzy c-means (GRFCM) algorithm, where, for each cluster, an image is automatically partitioned into three rough regions with two adaptively computed thresholds. In this paper, we utilize these two thresholds to partition the target image into three rough regions, i.e., the positive, boundary and negative regions [41]. Second, a new bounded indicator function is proposed to determine the bounded support regions of the observed data. The bounded indicator of a pixel that belongs to each sub-region is estimated with respect to the rough region where the pixel lies. Only those pixels in the positive and boundary regions have non-zero indicators. Therefore, because of the benefits of rough set theory, the proposed bounded indicator function can further manage uncertainty in data. Third, to further overcome the impact of noise and reduce over-smoothness for segmentations, two novel prior factors are proposed to introduce the spatial information. The proposed prior factors can be treated as the withinand between-cluster spatial constraints with spatial direction. Finally, to further improve the robustness of the model, for each component, the posterior probabilities of within-and between-cluster for each pixel are estimated with respect to the rough regions. The proposed algorithm is compared to several state-of-the-art segmentation algorithms on simulated and real images to demonstrate its superior performance.

Finite Mixture Model
The notations used throughout this paper are as follows. The target image is denoted as X = {x i , i = 1, 2, . . ., N}, where x i with dimension D is the intensity values for the ith pixel. The neighborhood of the ith pixel is denoted as @ i , and the labels are denoted as (O 1 , O 2 , . . ., O K ). In order to segment an image with N pixels into K labels, the density function of the finite mixture model [42] is given by: where P = {π ik }, i = {1, 2, . . ., N}, k = {1, 2, . . ., K} are the prior probabilities, and satisfy the constraints 0 π ik 1 and P K k¼1 p ik ¼ 1. In GMM [15,16], the component p(x i |O k ) is the Gaussian distribution Φ(x i |μ k , S k ) that can be written in the form: where μ k is the mean vector with D dimension, S k is the covariance matrix with D × D dimension, and |S k | is the determinant of S k . In order to address the issue that the observed data generally fall within the bounded support regions in practical applications, in [19,36,37], the bounded support region in < D is defined as @ O k for each label O k , and the indicator function can be written as With the above indicator function H(x i |O k ) and distribution p(x i |O k ), a bounded distributionpðx i jO k Þ can be defined asp For additional analysis details, please refer to [19,36] and [37]. However, the major disadvantage of indicator function H(x i |O k ) is that it is a binary function that cannot easily manage uncertainty in data.
To improve the noise robustness, the spatial information is generally incorporated through MRF distribution: where U(P) is the smoothing prior, and Z and T are two constants. Based on Bayes'rules, the probability density function can be written as: Most MRF-based mixture models have been successfully applied to image segmentation by adopting different energy functions U(P). Nguyen and Wu [23] pointed out that the M-step of EM cannot be applied directly to the prior distribution π ik due to the complexity of the loglikelihood function. Thus, the resulting algorithms are computationally complex and have to utilize large amounts of computational power to solve the constrained optimization problem of the prior distribution π ik [23]. To overcome these disadvantages, they introduced a novel factor G ik by defining a multiplication of both posterior probability and prior distributions as follows.
where z mk is the posterior probability and β is the balance parameter to control the smoothing prior. The main advantage of G ik is the ease of implementation and incorporation of the spatial relationships amongst neighborhood pixels in a simpler metric. Then the smoothing prior U (P) is given by: However, the energy U(P) can cause over-smoothing for segmentation and loss of details, especially for regions with abundant textures.

Proposed Model
In order to fit different data shapes, Nguyen et al [19] defined a new distribution p(x i |O k ) to model the component density. Motivated by the bounded asymmetric distribution, in this paper we modify distribution p(x i |O k ) to allow the model to easily incorporate the spatial information, which can be defined as: where L is the number of bounded multivariate Gaussian distribution Ψ(x i |μ kl , S kl ), and η ikl is the weighting factor and satisfies the constraints 0 η ikl 1 and Gaussian distribution Ψ(x i |μ kl , S kl ) is defined as: where Φ(x i |μ kl , S kl ) is the Gaussian distribution defined in Eq (2) and H(x i |O k ) is the indicator function for the bounded support region defined in Eq (3).
Therefore, in this paper, we propose a rough set bounded asymmetric Gaussian mixture model with spatial constraint for image segmentation. First, based on the rough set theory, we utilize our previous work [39] to partition the target image into three rough regions with two adaptively computed thresholds. Second, a new bounded indicator function is proposed to determine the bounded support regions of the observed data based on the above rough regions. Third, to further overcome the impact of noise and reduce over-smoothness for segmentations, two novel prior factors with spatial direction are constructed based on the prior and posterior probabilities of the within-and between-clusters. Finally, to further improve model robustness, for each component, the posterior probability is re-estimated based on the adaptively determined rough regions.

Determination of rough set region
Rough set theory can manage the uncertainty with lower and upper approximations. Specifically, let U 6 ¼ ; be a universe of discourse, and R be an equivalence relationship that can lead to a U partition. By denoting U/R = {X 1 , X 2 , . . ., X n }, where X c is an equivalence class for R, the lower and upper approximations of subset X are defined as: ( The lower approximation is a set where all categories certainly belong to X, and the upper approximation is a set where all categories possibly belong to X. Based on the approximations, three rough regions of X, i.e., the R-positive Po region, R-negative Ne region, and R-boundary Bo region, can be defined as follows [41]: In our previous work [39], based on the distance between each pixel value x i and intensity level g j , two thresholds are adaptively computed to determine the rough regions: where @ i is the neighborhood of pixel i with n pixels. In this paper, the size of neighborhood @ i is set as 3 × 3. J max and J min are the maximum and minimum intensity values of the image, respectively, and J is the number of intensity levels in the image. Thus, we construct distance vector d i = {d i (g 1 ), d i (g 2 ), . . ., d i (g J )} for each pixel i. The two thresholds can be estimated based on mean value d i_mean and minimum value d i_min of vector d i : Based on the distance between each pixel i and the mean value of class k using Eq (13), we can determine the rough regions for each cluster O k as follows: For more analysis and discussion details, please refer to [39].

Construction of bounded support region
As mentioned before, the bounded support regions of observed data were predefined in [19,[36][37][38], and the indicator function of the bounded support region is a binary function that cannot easily manage uncertainty in data. Motivated by the aforementioned observations, in this paper, we propose a new bounded indicator function based on the rough regions. For each label O k , the bounded support region in < D is defined as @ O k , and the new indicator function can be written as:H 8 > > > < > > > : With the new indicator functionH ðx i jO k Þ and distribution p(x i |O k ) in Eq (2), a bounded multivariate Gaussian distributionΨ ðx i jm kl ; S kl Þ can be defined as: where R O k Φðxjm kl ; S kl ÞHðxjO k Þdx is the normalization constant, and it is identified as the share of Φ(x i |μ kl , S kl ) that belongs to support region @ O k [19].
Similar to BAMM [19], each component density in our model is constructed with multiple bounded asymmetric distribution. The corresponding distribution p(x i |O k ) is defined as: where L is the number of bounded multivariate Gaussian distributionΨ ðx i jm kl ; S kl Þ, and η ikl is the weighting factor that satisfies the constraints 0 η ikl 1 and P L l¼1 Z ikl ¼ 1. It should be noted that the above distribution always satisfies the conditions of the probability density [14]: : Therefore, the log-likelihood function of the proposed model can be written as In order to maximize the above likelihood function, two variables z ik and y ikl are defined as follows: The values z ik and y ikl always satisfy the conditions P K k¼1 z ik ¼ 1 and P L l¼1 y ikl ¼ 1, respectively. It is worth mentioning that, based on Bayes'rules, we can treat both variables z ik and y ikl as the posterior probability. Variable z ik indicates the relationship between pixels and clusters that can be treated as the between-cluster relationship. Meanwhile, variable y ikl indicates the relationship between pixels and distribution components that can be treated as the within-cluster relationship because each cluster is modeled with multiple distributions. Consequently, variable z ik can be treated as the posterior probabilities of the between-cluster, whereas variable y ikl can be treated as the posterior probabilities of the within-cluster.
To further improve model robustness, for each component, the posterior probability of each pixel is re-estimated with respect to the rough region where the pixel lies. Thus, the new hidden variablesz ik andỹ ikl can be rewritten as: 8 > > > < > > > : 8 > > > < > > > :

Construction of prior factor
As we mentioned before, the prior factor G ik in [23] plays a role as an average filter on both posterior probability and prior distributions for smoothing noisy images, which may cause over smoothing for the segmentation and lose the details especially for the regions with abundant textures. The smoothing prior in [27] can adaptively select spatial directions, which introduces additional training complexity. To reduce the complexity of the smoothing prior and preserve more details for the segmentations, we propose two novel prior factors E ik and F ikl that consider spatial direction based on the prior and posterior probabilities of the betweenand within-cluster in this paper. Both prior factors are defined as: where @ S Ã k i is the neighborhood of pixel i at direction S Ã k for cluster k, which contains N S Ã k i pixels, and S Ã k is given by: where dist(x i , μ k ) is the Euclidean distance between point i and cluster center μ k , and @ s i is the neighborhood of pixel i at direction s. In this paper, we set S = 4, i.e., four directions (horizontal, vertical and two diagonal) are considered.
More explicitly, taking 3 × 3 neighborhood pixels as example, the filters for each neighboring direction can be constructed as shown in Fig 2. During the algorithm procedure, we only need to operate the convolution over the prior and posterior probabilities with these four predefined filters, and then adaptively select the satisfactory direction based on the differences among the intensity values along each direction. Therefore, the proposed prior factors can efficiently preserve more details.
Then, we incorporate the proposed prior factors into the smoothing prior: MRF distribution p(P) can be given by: where Z and T are set as 1. Therefore, the log-likelihood function of the proposed algorithm can be written in the form: Finally, we can optimize the involved parameters by maximizing the above log-likelihood function.

Parameter estimation
In order to determine label O k for each pixel x i , we need to estimate parameters P = {π ik , η ikl } and Θ = {μ kl , S kl } by maximizing log-likelihood function L(P, Θ|X). By monotonically increasing the logarithm character, maximizing log-likelihood function L(P, Θ|X) can lead to minimizing objective function J(P, Θ|X) as follows: By applying the complete data condition in [24], minimizing the negative log-likelihood function in Eq (31) can also lead to minimizing the objective function E(P, Θ|X) as follows: Then we can apply the EM algorithm to minimize Eq (32) by considering the derivation of function E(P, Θ|X) with respect to each variable. Finally, we can obtain the following updating function for each variable. Please refer to the Appendix for a detailed derivation of the proposed algorithm.
Mean value estimation: Covariance matrix estimation: Prior probability estimation: Consequently, the proposed algorithm for image segmentation is summarized as follows.
Step 5. Check the convergence. Stop the iteration if the convergence criterion is satisfied; otherwise, go to Step 2.
It should be noted that the convergence criterion is generally the distance between the values of objective functions or variables (i.e. means or covariance values) from two successive iterations. In this paper, we utilize the total distance between the mean values obtained from two successive iterations. When this distance becomes smaller than a user specified threshold, which was set to 10 −5 in all algorithms for this study, we think the algorithm converges and stop the iteration.
Unless otherwise specified, the parameters of the proposed algorithm are set as follows: The window size for prior factor construction is 3 × 3. The number of bounded multivariate Gaussian distribution is L = 3. A summary of the parameter settings for each comparison algorithm is listed in Table 1. Please see the corresponding references for more details. It should be noted that for fair comparison, all algorithms, including the proposed model, use the same initializations generated by the k-means algorithm for each testing image. All algorithms were implemented using MATLAB 7.8 platform and tested on a PC (Intel Core i7-4790 CPU, 3.60GHz, 16GB RAM, and 64-bit Windows 8).
The algorithms are compared using synthetic, synthetic and real brain MR, and color images. Fig 3 shows an example of each type of testing image. The segmentation results for synthetic images are quantitatively evaluated by correct classification ratio (CCR) [27], which is defined as: where gt k is the ground truth for cluster k, seg k describes the pixels classified by the algorithm to cluster k and GT ¼ S K k¼1 gt k . CCR ranges from 0 to 1 with a higher value representing a better segmentation result.
(2) The brain MR images are selected from two open sources: the BrainWeb (http://www. bic.mni.mcgill.ca/brainweb) [43] and IBSRv2.0 (https://www.nitrc.org/projects/ibsr) [44] databases. The objective for brain MR image segmentation is to partition the image into three tissue labels: gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF). The Dice coefficient (DC) [45] is utilized to quantitatively evaluate the performance for segmenting each type of brain tissue, which is the ratio between the intersection and union of segmented volume S 1 and ground truth volume S 2 The DC value ranges from 0 to 1, with a higher value representing a more accurate segmentation result.  [46]. The probabilistic rand index (PRI) [47] is used to assess the segmentation performances on natural images. PRI between segmentation map S seg to be evaluated and a set of M ground truth images S gt = {S 1 , . . ., S M } is given by: where c ij = 1 if pixels i and j belong to the same cluster, and c ij = 0 if pixels i and j belong to different clusters. PRI takes values between 0 and 1, with a higher value representing a more accurate segmentation result.

Illustration of proposed algorithm
In order to show the advantages and performance of the proposed algorithm in detail, we first segmented a brain MR image that is the 90-th slice of the 3D brain MR image with 9% noise   [19] has already proven its superior performance over GMM, SMM, and GGMM, in this section, we only present the estimated distributions obtained by employing f(x i |μ k , S k ) (i.e., the Gaussian distribution in the GMM model, Eq (2)), Ψ(x i |μ kl , S kl ) (i.e., the bounded Gaussian distribution in BAMM model, Eq (10)), and Ψ ðx i jm kl ; S kl Þ (i.e. the proposed distribution, Eq (17)).
In Fig 5, the noisy image with size 128 × 128 shown in Fig 5(a) is used to compare the performance of the proposed algorithm with GMM and BAMM. This image contains three labels (K = 3). The number of bounded multivariate Gaussian distributions for BAMM and the proposed algorithm is set as L = 3. The bounded region for BAMM is set as [0, 255] for each cluster. The ground truth of this image is shown in Fig 5(b), which presents the estimated density function obtained by GMM, BAMM, and the proposed algorithm based on the histogram of the observed data. Fig 5(d)-5(f) show the ground truth distributions for each class along with the estimated distributions of each corresponding cluster. From the histogram of observed data and corresponding ground truth distributions, we find that the image data are non-Gaussian, non-symmetric and bounded support data. Without the bounded constraint, GMM performance is not sufficiently satisfactory because the estimated density function cannot fit well with the histogram of observed data. Compared with GMM, both BAMM and the proposed algorithm better fit the observed data because of the introduction of the bounded multivariate Gaussian distribution. It can also be visualized that distributionΨ ðx i jm kl ; S kl Þ in Eq (17) allows flexibility for a better fit of the observed data compared with distribution Ψ(x i | μ kl , S kl ) of BAMM in Eq (10).
Similar to Figs 5 and 6 shows a gray natural image of size 481 × 321 that is used to compare the performance of the proposed algorithm with GMM and BAMM. This image contains three labels (K = 3). In comparison, the estimated distribution of the proposed algorithm is much better than that of the others.

Segmentation of synthetic images
In the second experiment, a synthetic image (image size: 128 × 128) as shown in Fig 7(a), is used to compare the performances among different algorithms. The image contains four labels with luminance values [0, 1/3, 2/3, 1]. The noisy image with Gaussian noise (0 means, 0.07 variance) is shown in Fig 7(b). From Fig 7(c)-7(g), we present the segmentation results obtained by employing the proposed algorithm, SCGM-EM, FRSCGMM, BAMM and BGGMM, respectively. Without considering any spatial information, the segmentation accuracies of BAMM and BGGMM are quite poor. The anti-noise ability for FRSCGMM is limited. Both SCGM-EM and the proposed algorithm obtain better performances. Nevertheless, the proposed method obtains higher CCR, especially for the pixels around the boundaries.

Segmentation of brain MR images
Because of the utilization of our previous work [39], in this section, we supplement the generalized rough fuzzy c-means (GRFCM) algorithm [39] as a comparison method. It should be noted that GRFCM can overcome the impact of intensity inhomogeneity in brain MR images, but all other comparison algorithms can only overcome the impact of noise. Therefore, in this experiment, we apply all algorithms to segment the synthetic T1-weighted 1 mm brain MR images selected from BrainWeb, which only contain different noise levels. For a fair comparison, the intensity inhomogeneity estimation part in GRFCM is removed.
Three sample brain MR images with 9% noise (80-th axial, sagittal and coronal slice), along with their segmentation results and ground truths, are shown in Fig 9. Similar to the proposed algorithm, both SCGM-EM and FRSCGMM construct spatial information based on the posterior and prior probabilities. However, without considering any directional differences, these two algorithms can obtain over-smoothness segmentations and lose details, especially for the CSF tissue. Meanwhile, both BAMM and BGGMM cannot well distinguish noisy pixels without considering any spatial information. GRFCM is not robust to noise, and the corresponding segmentations are not sufficiently smooth. By comparing the ground truth with the segmentations obtained with all algorithms, we see that the proposed algorithm visually obtains better results.
Segmentation accuracy for each tissue is measured in terms of DC values, and the results are listed in Table 2, which further demonstrates superior performance of the proposed algorithm. It is worth mentioning that the DC values of axial and sagittal CSF segmentation for BAMM and BGGMM are slightly higher than other algorithms because of the low percentage and abundant texture details of the CSF tissue. For any spatially constrained algorithm, the introduced spatial information can improve segmentation accuracy for images occupied by noise, but this can lead to smoothing the texture details in the image. Therefore, it is a dilemma to balance the trade-off between the anti-noise ability and over-smoothness for image textures. By comparing with two other spatially constrained algorithms (SCGM-EM and FRSCGMM), we see that the proposed algorithm achieves better trade-off and balance.
To statistically show the significance of the proposed algorithm, we apply the previous six algorithms to segment 10 axial, 10 sagittal and 10 coronal (from slice 86th to 95th) MR images for each noise level, where the level ranges from 3% to 9%. The statistical results (means and   standard deviations of DC values) are shown in Fig 10(a)-10(c). Moreover, the average CCR value over the entire segmentation results is show in Fig 10(d). From Fig 10, we can observe that the proposed algorithm produces more accurate segmentation (higher means) and has better robustness to noise (lower standard deviations). As indicated before, without considering any spatial information, BAMM and BGGMM produce satisfactory results when the noise level in the image is low (3%). SCGM-EM, FRSCGMM, and the proposed algorithm sacrifice segmentation accuracy for image details in orders to achieve better anti-noise ability. Therefore, the DC values of BAMM and BGGMM for the CSF tissue are slightly higher than the other three algorithms. By increasing the noise level, the performances of BAMM and BGGMM might decrease dramatically, especially for GM and WM tissues. Table 3 lists the statistical results (p-value of paired-sample t-test) between those five methods and the proposed algorithm on the testing images utilized in Fig 10. It is observed that, considering 0.05 as the level of significance, the proposed algorithm provides significantly better segmentation results with respect to both DC and CCR on the BrainWeb dataset.
In the next experiment, we apply all algorithms on the IBSR v2.0 data set [44], which contains 18 3D images. It is worth mentioning that BAMM [19] has been verified to outperform widely used algorithms, i.e., EMS [45] and SPM [48]. Therefore, in this comparison experiment, we only need to compare BAMM with the proposed algorithm in order to demonstrate the superior performance of the latter. Fig 11 shows a 3D slice view of the real dataset (IBSR04). The image shown in Fig 11(b) is the ground truth of the original image. Fig 11(c)-11(h) show the results obtained by implementing the proposed method, GRFCM, SCGM-EM, FRSCGMM, BAMM, and BGGMM. It is obvious that case IBSR04 contains low contrast between the GM and CSF tissues. BAMM and BGGMM cannot well distinguish the GM and CSF tissues when low contrast occurs. SCGM-EM and FRSCGMM lead to over-smooth segmentations. Without estimation of the intensity inhomogeneity, GRFCM also fails to distinguish the tissues with low contrast. By compareing with these methods, we see that the effect of noise and low contrast on the final segmentation of our algorithm is small and has the most similarity with the ground truth. Then we tested all those algorithms on 18 cases in the IBSR v2.0 dataset. The segmentation results were assessed in term of DC, and the variation of DC values was depicted in Fig 13. The statistical results, including mean, standard deviation (STD), and p-value of the t-test, of those methods on 18 3D brain images (from IBSR01 to IBSR18) were listed in Table 4. According to the obtained mean and STD, we can tell that the proposed algorithm steadily outperforms other five approaches. Based on the p-values, we find that, considering 0.05 as the level of significance, the proposed algorithm provides significantly more accurate segmentations on the IBSR v2.0 dataset than other five algorithm.

Segmentation of color images
In this section, we test all comparison methods on color images in the Lab color space selected from Berkeley dataset. In Fig 14, we compare the segmentation results of three real-world color images. The first row shows the original images with image ID "105019,", "100007," and "28083," from left to right, and the corresponding number of clusters is 2, 3, and 4. The segmentation results obtained by SCGM-EM, FRSCGMM, BAMM, BGGMM and the proposed SCGAGMM algorithm are shown from the second to sixth row. The first image (ID: 105019) is segmented into two classes: "lions" and background. As shown in the first row of Fig 14, SCGM-EM, FRSCGMM, BAMM and BGGMM cannot extract the "lions" accurately from the background. Parts of the background region are misclassified as the target. In comparison, the proposed algorithm can successfully extract the target from the background. We attempted to segment the second image (ID: 100007) into three classes, and the proposed method obtains better classification results with more detail. The third image (ID: 374067) consists mainly of four color components: "grassland", "trees", "mountain"and "sky". All comparison methods cannot distinguish well the "trees"or "mountain"from "grassland". In comparison, our method can successfully segments all objects, especially for the "mountain"region, and does not result in obvious misclassifications.
Finally, a set of color images is tested to evaluate the performance of the proposed algorithm against the SCGM-EM, FRSCGMM, BAMM and BGGMM methods. Table 5 lists the PRI values obtained with all methods for 30 real world images. The statistical results (SR), including  Table 5. From t-test results, we find that considering 0.05 as the level of significance, the proposed algorithm provides significantly better segmentations with respect to the PRI index. It is evident from the results that the proposed algorithm outperforms other methods with higher PRI values in most cases.

Comparison of computational complexity
The computational complexity of those five algorithms was compared in Table 6, where T is the number of iterations when the algorithm converges, N is number of pixels in an image, D is the dimension of each pixel, K is the number of clusters, L is the number of distributions and N @ i is the number of pixels in the neighborhood @ i . It should be noted that the computational complexity for the EM algorithm is of the order O(NKD 2 ) for each iteration [49].  Table 6 also gives the converging time, number of iterations and time-cost per iteration of five algorithms obtained by applying each of them to 100 2D brain MR images, which have a size of 217 × 181 and were selected from the BrainWeb dataset. In this comparative experiment, we checked for the convergence of the parameter values, set the stopping criteria to ε = 10 −5 and executed each algorithm with 100 iterations (Intel Core i7-4790 CPU, 3.60GHz, 16GB RAM, 64-bit Windows 8, and Matlab Version 7.8). It is worth mentioning that the proposed algorithm was performed in the MATLAB environment without any particular code optimization.

Conclusion
To overcome the limitations involved in most GMM-based algorithms, in this paper, we proposed a rough set bounded asymmetric Gaussian mixture model with spatial constraint for image segmentation. Based on the rough set theory, a new bounded indicator function was proposed to determine the bounded support regions of the observed data. The bounded indicator and posterior probability of a pixel that belongs to each sub-region were estimated based on the rough regions. The within-and between-cluster spatial constraints were introduced by incorporating the spatial information with adaptively selected direction in order to reduce over-smoothness for segmentations. Experimental results demonstrated that the proposed algorithm is flexible to fit the data shapes, and robust to noise, which makes our method be capable of producing more accurate segmentation results comparing with several state-of-theart algorithms. Future work will be devoted to reducing the complexity of the proposed algorithm.

Mean value estimation
Considering the derivation of function E(P, Θ|X) in Eq (32) with respect to μ kl , we have @EðP; YjXÞ @m kl ¼ À where the term R O k Φðxjm kl ; S kl ÞH ðxjO k Þðm kl À xÞdx is the expectation of functionHðxjO k Þ Â ðm kl À xÞ under distribution Φ(x|μ kl , S kl ), which can be approximated as [14,19,36] where s mkl * Φ(x|μ kl , S kl ) is the random vector drawn from distribution Φ(x|μ kl , S kl ), and M is the number of random vectors s mkl [19]. In this paper, we set M = 10 6 for all experiments. Similarly, the term R O k Φðxjm kl ; S kl ÞHðxjO k Þdx can be approximated as [14,19,36]: Based on Eqs (41) and (42), @E(P, Θ|X)/@μ kl from Eq (40) The solution @E(P, Θ|X)/@μ kl = 0 yields the updating function for μ kl during the iterations: By setting @E(P, Θ|X)/@S kl = 0, we obtain the updating function for S kl during the iterations: Prior probability estimation Because the prior distribution of between-cluster π ik satisfies constraint P K k¼1 p ik ¼ 1, Lagrange's multiplier τ i is used to enforce these constraints for each data point: Constraint P K k¼1 p ik ¼ 1 allows: Similarly, we obtain the updating function for the prior distribution of within-cluster η ikl under constraint P L l¼1 Z ikl ¼ 1: 30916011324, and in part by China Postdoctoral Science Foundation under Grants 2014T70525 and 2013M531364. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.