Similarity measure and domain adaptation in multiple mixture model clustering: An application to image processing

This paper considers three crucial issues in processing scaled down image, the representation of partial image, similarity measure and domain adaptation. Two Gaussian mixture model based algorithms are proposed to effectively preserve image details and avoids image degradation. Multiple partial images are clustered separately through Gaussian mixture model clustering with a scan and select procedure to enhance the inclusion of small image details. The local image features, represented by maximum likelihood estimates of the mixture components, are classified by using the modified Bayes factor (MBF) as a similarity measure. The detection of novel local features from MBF will suggest domain adaptation, which is changing the number of components of the Gaussian mixture model. The performance of the proposed algorithms are evaluated with simulated data and real images and it is shown to perform much better than existing Gaussian mixture model based algorithms in reproducing images with higher structural similarity index.


Introduction
The processing of an image as a whole becomes more challenging with the increase in the image data size. In a lot of the applications of image analysis, it is not feasible to process an entire image of a large size. The most common approach to addressing this problem is to scale down the data size so that the computational complexity can be reduced. There are popular methods used for scaling down image data size: i) sampling-start with a subset of the image data, [1][2][3][4], and ii) partition into blocks-divide an image into m x n blocks, [5][6][7][8][9][10]. Although these methods are simple, they have been developed into popular techniques, for examples, bag-of-features [11], block based compressed sensing [12].
The basic notion of scaling down data through sampling is to apply an extended clustering scheme [4] where the clustering algorithm is first performed on a manageable sample of the whole data set, and then the results are extended to classify the remaining data. The main drawback of this simple method is that the number of clusters captured in the training sample may not represent all the clusters in the whole data set. In other words, the source domain and the target domain are different. Without domain adaptation during the extension to classify a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 the whole data set, it tends to miss out small but important information. The shortcoming of overlooking small localized variation of image components may not be significantly reflected in the global error distortion measures such as mean square error (MSE) or signal to noise ratio, but it is an important issue to be addressed especially in medical imaging as often there are only subtle differences in visual features between the normal and pathological images [13]. In the Gaussian mixture model (GMM) framework, some works have been proposed to improve the clustering result of the selected sample. For example, algorithms for splitting clusters based on statistical tests have been proposed to improve the accuracy of the clusters captured by the training sample [14][15]. However, these algorithms lead to discovery of false clusters. Variants of the expectation and maximization (EM) algorithm have been proposed to improve the capture of small clusters [16][17] and the identification of overlapping clusters [18][19] in the training sample. However, this does not solve the problem of lack of representativeness of the training sample. [1] improves the unstable results of the sampling based algorithm by selecting several best models [20][21] based on the training sample data, and run several EM steps on the full data set to select the final best model. Their recommendation of using multiple samples as unsupervised training sets motivates the development of the first algorithm in this paper, FlexClustS. The proposed sampling based GMM algorithm performs domain adaptation among the clustering results from multiple samples, and therefore improves the existing algorithms, especially [20][21], from three aspects: (i) recovers clusters that have not been identified in the training sample, (ii) recovers small but important clusters, (iii) preserves image features better, and (iv) does not unrealistically pre-define the number of clusters in the whole data set.
On the other hand, algorithms that work on image data divided into blocks normally consist of two phases. In the first phase, each block of the image data is compressed or summarized and represented by descriptors (or prototypes) of features. Then, the collection of descriptors from all the blocks are incorporated based on a particular similarity measure such as Euclidean, Mahalanobis or Manhattan distance. One of the most noticeable degradations of this method is blocking artifacts [22][23]. This happened when the local features from each block of image are processed independently without taking into account the information of the adjacent blocks, and it results in discontinuities in the block boundaries. In the existing GMM based algorithms, each block of data is normally summarized by k-means method, and each resulting subcluster is represented by a descriptor of triplet statistics (mean, variance and number of data points). Then, a variant of expectation and maximization (EM) algorithm is used to fit the descriptors from all the blocks of data into GMM [24][25]. There are shortcomings in these algorithms. First, using k-means as the partial image representation model does not capture well of the image features from each block if the pixel clusters are not spherical in shape. Second, GMM clustering based on variant of EM increases the computational complexity, especially if the cumulative number of descriptors from all the blocks is large. Therefore, this paper proposes the second algorithm based on multiple blocks of image data, FlexClustB, to improve the existing algorithms, especially [24][25] from two three aspects: (i) preserves image features better, (ii) reduces computational complexity during clustering of all descriptors by using similarity measure, and (iii) avoids blocking artifact. This paper proposes two GMM based algorithms which are termed as FlexClustS (Flexible number of clusters-sampling based) and FlexClustB (Flexible number of clusters-block based). For ease of explanation in the following sections, FlexCustS and FlexClustB are grouped under FlexClust. The two algorithms are quite similar except for the method used to scale down the data size. A brief description of the two proposed algorithms is given as follows: First the image data is scaled down by dividing it into multiple samples or m x n blocks for FlexClustS and FlexCLustB respectively. A scan through and selection procedure is proposed for initialization of the GMM, and each sample or block of the image is represented by a GMM. The idea of scan through and selection procedure is adapted from [26][27] to isolate the small details of the image and over represent these pixels to increase the chance of their detection. GMM is chosen in this paper because it has been proven to be effective for patterns representation and it preserves the image features well as exemplified in many applications: classification of and 12-lead electrocardiogram (ECG) [28]; segmentation of image [29] and brain magnetic resonance images [30]. This paper proposes to use the maximum likelihood estimates (MLEs) of the GMM as the local image descriptor for each sample or block. Next, the descriptors of MLEs resulting from multiple GMM clustering will be aggregated into a compact representation of the entire image by a proposed mixture model distribution. This is done by considering the image representation of one of the samples or blocks as source domain, and the remaining being representation of samples or blocks as target domain that are to be classified. The classification is based on a proposed pairwise similarity measure known as modified Bayes factor (MBF), which is an adapted Bayesian model selection criterion. If the MBF suggests that any descriptor has novel local features, the proposed model is updated by allowing domain adaptation to change the number of mixture components.
The main contributions of this paper are summarized as follows: 1. The introduction of two algorithms, FlexClustS and FlexClustB, to work on scaled down image data more effectively in preserving the image details and avoids the problem of blocking artifacts.
2. Propose the Gaussian mixture model with a scan through and selection procedure for feature extraction, which enhances the possibility of the detection of small details of the image.
3. Propose the modified Bayes factor for similarity measure, which makes use of the partial image descriptors, and detects novel local image features for domain adaptation 4. A mixture model distribution for compact representation of the entire image, which takes care of domain adaptation when classifying the local image descriptors obtained from samples or blocks of image.
The remainder of paper is organized as follows. Section 2 briefly reviews the theoretical background of the Gaussian mixture models related to the proposed algorithm. Section 3 describes the detail of the FlexClustS and FlexClustB algorithms. Section 4 presents the results on simulated data and application to real images. Finally, the discussion and conclusion are presented in Sections 5.

Theoretical background
In this section, we describe the Gaussian mixture model since it is closely related to the proposed algorithm. From this section onward, the components of the mixture model also refer to the groups, clusters or classes of pixels.

Gaussian mixture model for clustering
In this paper, the Gaussian mixture model (GMM), with improvement in initialization, is used to compress the samples or blocks of image through clustering. Performing clustering via mixture models not only has the advantage of having a means of estimating the parameters of the model by employing the expectation-maximization (EM) algorithm, but also helps to determine the number of clusters through the comparison of the Bayesian Information Criterion (BIC) [31].
In mixture model clustering of image data, the d-dimensional random pixels of size n, x 1 , . . ., x n , are assumed to have been generated from a mixture of a finite number, say G, of underlying probability distributions. The mixture density for each x i is expressed as where π k is the non negative mixture proportion for the kth component which satisfies S π k = 1, and C = (π 1 , . . ., π G , θ 1 , . . ., θ G )is the vector of all the unknown parameters. In GMM, the parameter θ k consists of a mean vector μ k and a covariance matrix S k , and the density has the form where |S k | is the determinant of the covariance matrix.
The MLE of parameters of the mixture model can be estimated iteratively by applying the EM algorithm [32]. In clustering, the EM algorithm for clustering is a general approach to maximize the likelihood function in the presence of a set of unobservable group-indicators z 1 , . . ., z n which are treated as incomplete data. Each of these indicators has the form z i = (z i1 , . . ., z iG ) with z ik = 1 if x i belongs to group k, otherwise z ik = 0. Therefore, the complete data log likelihood of GMM is given by An iteration of EM algorithm for GMM is as follows: in the E-step of the tth iteration, calculate the conditional probabilities, z ik , that x i arises from the kth mixture components for the current value of the mixture parameters as given by while the M-step of the (t+ 1)th iteration involves update of mixture parameters estimates, π k , μ k , and S k , maximizing Eq (3) by substituting the values of z ik (t) computed from Eq (4) as follows: Let the MLEs of the GMM beΨ ¼ ðπ k ;μ k ;Σ k Þ; for k = 1, 2, . . . k, the pixel x i can be assigned to the component of the mixture with the highest estimated posterior probability wherê One of the advantages of using mixture model clustering is that the model with the appropriate number of clusters or mixture components may be chosen by using the Bayesian Information Criterion (BIC) [33], where p is the functionally independent parameters to be estimated in the MLEs of the GMM.The selected model is the one with the minimum BIC.

Gaussian mixture model for classification
When sampling is used for scaling down the data size for GMM based image processing, the common procedure is to perform unsupervised training through GMM clustering for the pixel sample as described in Section 2.1, and then use discriminant analysis to classify the remainder of the image pixels [20][21]. Basically, the GMM for classification or discriminant analysis applies one E-step to the remainder of the image data using the parameters obtained from the clustered sample. The posterior probability that a pixel x i belong to the kth class is calculated by and the pixel is classified to the class in which it has the highest posterior probability.

Gaussian mixture model for summarized data
Dividing an image into blocks is always followed by the compression step where the image data of each block is summarized by a specific set of quantities (prototype or descriptor). Gaussian mixture model for summarized data has been introduced by [24][25]. The basic notion is to perform a variant of EM algorithm for the descriptor of triplet statistics (mean, variance and number of data points) or sufficient statistics. Assume that a data set has been summarized to a set of m descriptors of sufficient statistics (" x i , S i , n i ), for i = 1, 2, . . .,m, where " x i and S i are the mean vector and covariance matrix of the summarized data points for descriptor i, and n i is the number of data points. Then, the corresponding complete log likelihood for the prototype set is given by log L s c ðΨj" x 1 ; . . . ; " where z = (z' 1, . . ., z' m )' denotes the component membership of the m descriptors. The sufficient EM algorithm operates on the sufficient statistics to maximize the complete descriptor log likelihood in Eq (9) [25]. For the tth iteration, the mean vectors " x i are used in the E-step to calculate the expected component memberships z ik (t) of the descriptors which are equal to their posterior probabilities In iteration t+1, the weights which reflect the descriptor sizes, n i z ik , are introduced. Therefore, the component means are calculated as the weighted sum of prototypes means given by the component covariance matrices are calculated by decomposing into sum of the weighted between and within descriptor sum of squares and products matrices B SSP,k (t) and W SSP,k (t) respectively, given by and the mixing proportions are given by for all the mixture components k = 1, . . ., g.
The number of mixture components is assessed by a variant of BIC given by where the likelihood is sufficient likelihood from an approximation of the likelihood of the original data, d is the number of parameters to be estimated for the mixture, and n is the number of single observations.

The FlexClust algorithms
To overcome the drawbacks of the algorithms for scaled down image data as described in Section 1, this paper proposes the FlexClustS and FlexClustB algorithms. The main idea of the proposed algorithms is to iterate over samples or blocks of the image data set. The three main modules in the algorithm are: (1) representation of the multiple samples or blocks of image using GMM guided by scan through and selection procedure, (2) calculation of the pairwise similarity measures of the descriptors of samples or blocks, and (3) domain adaptation to obtain a GMM compact representation of the entire image. The overview of the algorithms is given in Fig 1. The three modules of the algorithms will be described in the following sub-sections and the summary of the algorithms will be presented at the end of this section.

Represent multiple samples or blocks of image
In this paper, the scan through and selection procedure is proposed to improve the inclusion of the small details of the image. These pixels are isolated from the relatively small pixel clusters as follows: The number of pixel clusters is set as a priori by k = 0.01 p s N. The aim of the k-means is to divide the n b image data points into k pixel clusters in order to minimise an objective function given by where || . || 2 is the Euclidean distance between x i (j) and c j , and x i (j) is the data point x i from cluster j, and c j is the centre of cluster j.
2. Consider there are n j data points in cluster j. If the proportion of data points in cluster j (= n j /N) is less than a threshold, say ε = 0.01, consider them come from small pixel cluster, and let them be in set Q s .
3. Repeat steps (i) and (ii) until all the 100 blocks of the data points have been scanned through. Replicate set Q s q times to over represent it to form set Q. This is to increase the chance to detect small pixel clusters in the later step. Adjust q according to the allowable memory for computation. In the case if the image has very fine structure with very small pixel clusters to be recovered, increase p s and reduce k of k-means to increase the chance of capturing them in set Q s .
Consider now the image as being divided into multiple samples or blocks. The set Q is added to one of the samples or blocks of the image. Let S 1 = {x 1 , x 2 , . . ., x n1 } be the combination of Q and the first sample or block of the image data set. S 1 is represented by the descriptors of GMM MLEs as follows: 1. Fit S 1 into a g 1 -component Gaussian mixture model using the complete log-likelihood function in Eq (3). Repeat the step of fitting Gaussian mixture model for the remaining samples or blocks.
2. Let the MLEs of the parameter set for the t-th sample or block of image data bê is the vector of the MLE of mean and full covariance matrix, andp tk is the MLE of mixture proportion, for the k-th cluster from the t-th portion, where k = 1, . . ., g t .
3. The MLEs of each individual cluster are estimated approximately from the decomposition of the mixture model components. Thus, for the tth portion, MLEs of the parameter set is decomposed into its mixture componentŝ where n tk =p tk n t is the kth cluster size.
Therefore, each of the image sample or block is now represented by the GMM MLE of pixel clusters given by the descriptors ofĈ where k = 1, . . ., g t , and g t is the number of clusters in the t-th sample or block.

Similarity measure: A modified Bayes factor
In this paper, a similarity measure using Bayesian approach based on model selection is proposed to distinguish between the homogeneous and heterogeneous clusters of pixels from different portions of image data. The proposed modified Bayes factor (MBF) works on the descriptors obtained from the previous step. For simplicity but without loss of generality, consider the descriptors from two portions of the image data. Let cluster i and cluster j be the pixel clusters from the first and second portion of the image data respectively. An assumption is made in developing the MBF: (i) if the two clusters i and j are similar, the clusters are merged for the later step, and (ii) if the two clusters i and j are dissimilar, the two clusters are maintained as they are for the further step. This notion actually implies the choice between two models each with the number of clusters k = 1 and k = 2 respectively, that is, if cluster i and cluster j are similar and can be merged, if cluster i and cluster j are dissimilar and cannot be merged, see Section 3.3 for more details. We choose the Bayesian approach for the above problem as it has advantages over the alternative frequentist hypothesis testing in the general context of model comparison; see [34] for details. The Bayesian application for pair wise models comparison and model selection is based on the Bayes factor [35][36]. Let x be the image data set for the pair of pixel clusters, the Bayes factor is given by the ratio of the posterior odds to its prior odds in favour of a model M 1 over M 2 The Bayes factor in Eq (17) is the likelihood ratio, and the densities, p(x | M i ) for i = 1, 2, are obtained by integrating (not maximizing) over the parameter space given by where θ i is the parameter of M i , π(θ i | M i ) is the prior density of the parameter, and p(x | M i ) is the probability density of x given θ i , or the likelihood function of θ i . In practice, the marginal probability of the data, also termed as marginal likelihood or integrated likelihood, obtained from Eq (18) is often difficult to compute. [37] extended the Bayes factor for a standard comparison of nested hypotheses in the general linear model in the pdimensional multivariate normal case with the following approximation: where λ is the likelihood ratio test statistic, δ r,r+1 is the degree of freedom in the asymptotic chi-square distribution of λ, n r,r+1 is the number of data points in the merged cluster, and ρ(n r,r+1 ) is the rate of "shrinkage" of the prior covariance matrix which can be approximated by n r,r+1 when n r,r+1 is large. Unfortunately, the regularity conditions do not hold for λ to have its usual asymptotic null chi-square distribution with the degree of freedom δ r,r+1 in the clustering context. Based on a small scale simulation study of multivariate normal component densities with common covariance matrix for the number of clusters k = 1 versus k = 2, [38] suggested an approximation of 2δ r,r+1 to get around the problem.
In the proposed algorithms, the decision whether to select between the models with the number of clusters k = 1 and k = 2 for each of the pixel cluster pairs. Thus, we adapt a special case of Eq (19) when r = 1 with Wolfe's approximation, and further assume that the merged pixel cluster size is large for image data clustering, to approximate the Bayes factor as follow Let the maximum log-likelihood for the pair of pixel clusters involved in merger be log L i and log L j respectively, and the maximum log likelihood for the cluster resulting from the merger be log L m . Therefore, the term λ can be written as From Section 3.1, the pixel clusters involved in the merger are described by their MLEs decomposed from Gaussian mixture models. Therefore, the merged pixel cluster will be described by the weighted MLEs of the pair of pixel clusters (see Section 3.3). The maximum log-likelihood functions of the paired and the merged clusters are of the same form which is given by Thus, the concentrated log-likelihood is Substituting Eq (22) for the paired and the merged clusters, and Eq (21) in Eq (20), we get the proposed modified Bayes factor (MBF) as The MBF suggests the choice of models based on the change in log-likelihood as a result of merging the pair of pixel clusters together. From Eq (23), it can be seen that the smaller the generalized variance the larger is the log-likelihood. Thus, if MBF is positive, the merged cluster gives bigger generalized variance and smaller log-likelihood (more negative) than the pair of pixel clusters, and this suggests that the pair of pixel clusters should not be merged, or in other words, they are dissimilar. On the other hand, if the MBF is negative, the merged cluster gives smaller generalized variance and larger log-likelihood (less negative), and the pair of pixel clusters should be merged, which implies that the clusters are similar.
The main advantage of the proposed MBF similarity measure is not only to provide information for the compact representation for the entire image by merging similar clusters to produces higher maximum log-likelihood, but also information for domain adaptation.

Domain adaptation and compact representation
A mixture model distribution is proposed to aggregate the sets of local image descriptors in the format of GMM MLEs into a compact representation of the entire image. As the different image samples or blocks may have different numbers of descriptors and some descriptors may consist of novel local features, domain adaptation will be performed.
Consider the GMM representation of S 1 in Section 3.1 as source domain, the descriptors from the other samples or blocks are in the target domain. Let be the decomposed MLEs of the pair of pixel clusters from the source and target domain respectively, and ðm m ;Ŝ m ; n ma Þ be the MLE of sufficient statistics (μ m , S m ), where n ma is the cluster size of the merged cluster. If MBF suggests that the two descriptors are similar and can be merged, the parameters of the GMM model trained from S 1 are updated using weighted MLEs as follows: 1. The MLEs for the merged cluster are estimated from 2. The mixture proportions of the trained model becomê for the component involved in merging; and for the other components.
The GMM model is now updated to On the other hand, if the MBF suggests that the two descriptors are dissimilar, domain adaptation will be performed by adding a new mixture component. The mixture proportions of the model are updated as follows: for the new added cluster; for the other existing components.
The GMM model is now given by where The compact representation of the entire image is obtained through incremental model updates. In each of the iteration in the model update, only the GMM MLEs are used. The domain adaptation is performed over two sets of MLEs instead of revisiting the pixel data points. Hence, the proposed FlexClustS and FlexClustB clustering algorithms are scalable to very large image data sets. In the reconstruction of image using the GMM compact representation, the mixture component without any assignment will be considered as spurious component and therefore be removed as it has almost no negative impact on the model quality [39].
The FlexClustS and FlexClustB algorithms are summarized in Algorithms 1 and 2 respectively.
Isolate the small pixel clusters using Eq (15). Stage 2: Divide image into samples. Add isolated pixels to one of these samples. Represent each sample using Eq (3), and using descriptor given by Eq (16). Stage 3: Calculate the similarity measure between descriptors obtained from Stage 1 using Eq (23).
Aggregate the sets of local image descriptors based on the similarity measures.
If descriptors are dissimilar, perform domain adaptation, update GMM model using Eqs (28) and (29). The representation in GMM is given by Eq (30). Algorithm 2. FlexClustB. Stage 1: Isolate the small pixel clusters using Eq (15). Stage 2: Divide image into blocks. Add isolated pixels to one of these blocks. Represent each block using Eq (3), and using descriptor given by Eq (16). Stage 3: Calculate the similarity measure between descriptors obtained from Stage 1 using Eq (23).
Aggregate the sets of local image descriptors based on the similarity measures.
If descriptors are dissimilar, perform domain adaptation, update GMM model using Eqs (28) and (29). The representation in GMM is given by Eq (30).

Algorithms for comparison
The performance of the proposed FlexClustS and FlexClustB is compared to two existing mixture model algorithms: Strategy III [1] (See Section 2.3) and sufficient EM [25] (See Section 2.2) respectively. Strategy III and sufficient EM are chosen to represent respectively the sampling based and blocks based methods of processing scaled down image data mentioned in Section 1. Strategy III applies a mixture model clustering to a sample of the full data, and then extends five tentative best models from the sample via EM to the full data in more iteration to eventually select the best model from the tentative best models. Sufficient EM is a variant of EM used in parameter estimation for mixture model clustering of multiple sets of sufficient statistics (i.e. means and covariance, and the number of data points). Each set of sufficient statistics characterizes a dense region of data points that is obtained by k-means clustering.

Data
Three set of simulation data with known cluster label and five sets of image data i.e. St Paulia, cytology, Lena, sailboat and San Diego are used to evaluate the performance of the proposed algorithms.
The first set of simulated data consists of 15,000 data points generated from a seven-component two-dimensional Gaussian mixture distribution. Special attention is paid to the relatively small nested Cluster-6. The parameters for the data set are as follows: The second and third sets of simulated data are generated using the population parameters of the wine and iris data set from UCI machine learning repository [40] that are fitted to the three-component VVI model and three-component VEV model [21,41] respectively (available at http://archive.ics.uci.edu/ml/datasets.html). The generated wine and iris data sets are of sizes 20,000 and 10,000 respectively. The wine data set is concerned with the chemical quantities of 13 constituents found in each of the three types of wines grown in the same region in Italy. It has "well behaved" class structures. The iris data set contains 3 classes (Versicolor, Virginica, and Setosa) of iris plant based on the measurement of four features i.e. sepal length and width, petal length and width. Two of the three classes in the iris data are overlapping. These three sets of simulated data are calibrated using MixSim [42]. The calibration of data set is based on the criteria of average pairwise overlap and maximum pairwise overlap [43]. The calibration results are shown in Table 1.
Based on [43], the interpretation for degree of component overlap from pairwise overlaps value is: well separated (below 0.05), moderate separated (between around 0.05 and 0.1), and poorly separated (above 0.15). Therefore, the clusters of the first simulated data set have the highest degree of overlap and followed by the third data set and second data set.
Five sets of RGB image data St Paulia, cytology, Lena, sailboat and San Diego are considered for application. St Paulia (304 x 238 pixels) is a flower image which has been used in [1]. Identifying the small yellow flowers is of particular interest. Cytology (248 x 150 pixels) is obtained from the Internet (https://commons.wikimedia.org/wiki/File:Canine_transmissible_venereal_ tumor_cytology.JPG, owned by Joel Mills). Identifying the details of the cell structure is the

Evaluation criteria
The performance of FlexClust is assessed according to three main aspects of (i) how well the features in partial data are captured, (ii) how well the descriptors from different partial data are classified, and (iii) how well the recovery of small but important clusters which are incorporated through domain adaptation into the GMM compact representation of the entire data set.
With the known cluster label for each data point of the simulated data, the performance of the capture of cluster features and classification of the descriptors are evaluated through the partitioning error and labelling error. The partitioning error is measured by Adjusted Rand Index (ARI) [45]. Given a set of n objects with two partitions U and V, the ARI is a chance-corrected measure of agreement about the number of pairs of objects that belong to the same group and different groups between the two partitions as summarized below: The ARI is defined as ARI is equal to one for perfect agreement, and takes negative value if the agreement is lower than what is expected by chance. The labelling error is measured by misclassification error which is the proportion of data points that is clustered into the wrong group. The aspect of incorporation of novel clusters through domain adaptation is assessed by model fit and the number of clusters in the final GMM compact representation of the entire data set. The value of log-likelihood is used to assess the model fit.
For image data, the true class label of every pixel is normally unavailable. A more objective performance measure on different clustering algorithms for image processing should involve assessment between the reproduced images and the reference (or ground true or original) image quality. The simplest and most widely used image quality metric is mean square error (MSE), where the intensity differences of the reproduced image and the reference image pixels are squares and then averaged. However, [46] showed that the images with different degree of distortions altered from the same original image with drastically different perceptual quality based on human visual are having nearly identical MSE. [46] developed structural similarity index (SSI) for measuring the similarity between two aligned image signals. The SSI is a quantitative measurement of the quality of an image provided a reference image is regarded as of perfect quality. It is a combination of three components namely luminance, contrast and structural components. The comparison of images is based on the estimates of intensity mean shift for luminance, change of intensity standard deviation for contrast, and change of normalized signal intensity for structure or the collective remaining errors. The application of SSI for image processing evaluation has been rapidly increasing and become a widely accepted image quality metrics. In this section, the qualities of the images processed by FlexClust, Strategy III and sufficient EM are assessed by comparing to the ground true image, and the SSIs are computed. When SSI is equal to one, it indicates there is no loss of information in the reconstructed image, and the nearer SSI is to 1, the better the image quality. All the SSIs in this study are implemented in ssim.m [47]. The images are also assessed visually based on qualitative evaluation as human visual is efficient to detect if there is loss of image detail [48].

Experiment setting
For the comparison of the sampling based algorithms, the initial sample sizes in Strategy III are set equal to the sample sizes of FlexClustS. Two different sample sizes with 10 experiments each are considered so that the conclusions do not depend on the sample size and particular sample drawn. With consideration of reasonable computation time for the EM algorithm, the sample sizes selected for images with 23712 pixels to 262144 pixels range from 500 to 2500 pixels [1]. At the same time, this paper also intends to evaluate the effectiveness of the algorithms by using rather small proportion of the image as the sample size. Therefore, the sample sizes considered are 1% and 2% of the image data, which are 723 and 1447 pixels for St Paulia, 372 and 744 pixels for cytology respectively. For larger image size, the sample sizes considered are 0.5% and 1% for Lena, sailboat and San Diego, which are 1310 pixels and 2621 pixels.
In the evaluation of the block based algorithms, each image is divided into two different numbers of blocks and the sufficient statistics of the pixel clusters are obtained from each block. The block sizes are 8x2 and 16x2 for St Paulia, 5x4 and 10x8 for cytology, and are 8x8 and 16x16 for Lena, sailboat and San Diego. The total number of sets of resulting sufficient statistics is set about the same as the portion size in FlexClustB. However, if the set of sufficient statistics consists of local dense region with only one data point, the number of sets of sufficient statistics has to be reduced so that the covariance exists.
All the clustering algorithms consider 2 to 10 clusters for the simulated data and 3 to 15 clusters for the image data. The clustering of all the experiments in FlexClustS, FlexClustB and Strategy III are performed using MCLUST [21,41] which considers ten parameterizations of the cluster covariance matrices and uses solution obtained from hierarchical clustering for initialization of the EM algorithm. The selection of MCLUST is based on its comprehensive strategy for clustering, classification and density estimation for Gaussian mixture model, which are in line with the objectives of this paper. See [49] for more details on the capability comparison of R packages for Gaussian mixture modelling. For FlexClustS and Strategy III only the four most elaborate models i.e. EEE, EEV, VEV and VVV [1] in MCLUST are considered. The maximum number of iterations for all the three algorithms is set as 100. For the simulated data, the sufficient statistics for sufficient EM are obtained from summarizing the dense regions of the whole data set.

Result on simulation
Results of the simulation study are shown in Table 2. For the sampling based algorithms, Flex-ClustS outperforms Strategy III in terms of the agreement of partition, agreement of class label, and model fit in the well separated and moderated separated components of Data Sets 2 and 3 respectively regardless of the sample size. FlexClustS also performs better than Strategy III in the poorly separated components of Data Set 1 in terms of agreement of partition and model fit when different sample sizes are used. However, the labelling error for this poorly separated components data set is influenced by the sample size. The labelling error of FlexClustS is slightly higher than Strategy III when the sample size is 500 but much lower than Strategy III when the sample size increases to 1000. For the partition based algorithms, FlexClustB outperforms sufficient EM in terms of the agreement of partition, agreement of class label, and model fit for data sets with different degrees of component overlap.
Based on the cK, FlexClustS performs more consistently and accurately in determining the correct number of clusters than other algorithms for different degrees of component overlap and dimensions of data set. FlexClustS only does not 100% times correctly identify the number of clusters in Data Set 1 with sample size 1000, but it is still better than other algorithms.The model update of FlexClustS in Data Set 1 is used to illustrate how MBF makes FlexClustS outperforms other algorithms in recovering the novel local feature that has not being identified in the early portion of data, and how the proposed domain adaptation in model update helps to estimate the model parameters closer to the actual value and results in higher log likelihood values. In Fig 2(A), the MLE of means of the initial sample with the sizes of 500 shows apparently that Cluster-6 is not found at this stage, and the MLE of means are further from the actual means. In the third sample, Cluster-6 is identified as depicted in Fig 2(B), and the MBF suggests it is a new cluster. From Fig 2(B) to 2(C), no new cluster is found, and the MLE of means are closer to the actual means. Strategy III and sufficient EM tend to overestimate the number of clusters and identify superfluous components or even identical clusters. Fig 3 shows the examples of cluster structure obtained from the three algorithms.
The results show that the effects of initial sample and sample size are very minimal for all the algorithms. However, like other sampling based algorithms, sample size does affect the performance of FlexClustS in determining the correct number of clusters and the agreement of class label when the components are poorly separated. The complexity in terms of number of clusters of the final model obtained by FlexClustS is observed to increase with the sample size. Similarity measure and domain adaptation in mixture model clustering More clusters are used to describe the sample especially at the overlapping area between the elongated Cluster 1 and Cluster 2 when the portion size increases.
In terms of computational time as shown in Figs 4 and 5, FlexClustS is slower than Strategy III, and FlexClustB is slower than sufficient EM in the 2-dimensional Data Set 1. However, FlexClustS runs faster than sufficient EM in this data set. For higher dimensional data sets, FlexClustS and FlexClustB take longer time than the competitor algorithms.  Table 3. FlexClustS reproduces better quality image based on structural similarity index for St Paulia, cytology and sailboat regardless of the sample size used. FlexClustS with larger sample sizes reproduces all images with slightly higher SSIs than the smaller sample size. However, the same result is not observed in Strategy III. With larger sample sizes, Strategy III reproduces slightly higher SSI for cytology and sailboat, but not St Paulia, Lena and San Diego.  Although the sample size selection influences the final result, its effect is very minimal. Furthermore, when the SSIs are compared between FlexClustS and Strategy III on the same image across different sample sizes, the results are consistent. It is interesting to note that even when FlexClustS processes only 10% of the image data, the SSI of these images are better than those obtained by Strategy III. Fig 6 shows that the SSI does not change much when FlexClustS processes from 10% to 100% of the image data. Some SSIs improve as more percentage of the image data is processed but some decline. For San Diego, the SSI of FlexClustS [n = 1%N] after processing 10% of the image data is higher than Strategy III, but lower after processing the whole image data.
The  Figs 12 and 13, it can be seen that the feature of the road is better preserved by FlexClustS compared to Strategy III. A considerable number of pixels of the road are mistakenly assigned as the colour of the sky or river.
Results of images processed through dividing image into blocks are shown in Table 4. Flex-ClustB demonstrates good overall performance. The SSIs of all the images by FlexClustB are  Table 1 for the values of n and np). https://doi.org/10.1371/journal.pone.0180307.g004 Similarity measure and domain adaptation in mixture model clustering higher than sufficient EM especially for sailboat (16x16), which are 0.8826 and 0.5506 respectively. It is interesting to note that the SSIs of FlexClustB by block are the highest in all the images even when compared to FlexClustS.
Examples of images processed by FlexClustB and sufficient EM are shown in Figs 15-20. When the images are assessed visually, it is found that regardless of the number of blocks the     Similarity measure and domain adaptation in mixture model clustering Similarity measure and domain adaptation in mixture model clustering

Evaluation of number of clusters.
Comparison between cluster numbers obtained by FlexClustS and Strategy III based on sample data, and between FlexClustB and sufficient EM based on division of image into blocks for the 5 images are summarized in Table 5. The results show that the numbers of clusters obtained by FlexClustS and FlexClustB are always more than Strategy III and sufficient EM, and thus produce better quality in image recovery. The results are consistent with the image segmentation results by [50] where insufficient number of clusters could lead to classification errors in image segmentation, and by Gaussian mixture model [51] which tends to describe similar structure in an image via the multiple components where each component represents different levels of contrast.    processed by FlexClustS in order to speed up the algorithm will be studied in future. In Fig 21  (B), the SSI of FlexClustB is always higher than sufficient EM but at the cost of longer computational time.

Discussion and conclusion
In processing scaled down image either from sample data or blocks of divided image, the representation of partial image, the similarity measure and the domain adaptation are the three crucial issues to be addressed. The FlexClust algorithm is proposed to tackle these  Note: see Table 3 for sample size, and Table 4 for block sizes. problems. FlexCust can be implemented in two ways either by dividing the image into multiple samples (FlexClustS) or blocks (FlexClustB). Whatever methods used to represent the image, there is loss of information. The problem is even more challenging when working on partial image data. Small but important information tend to be missed out. It is important to address this issue especially in medical image as often there are only subtle differences in visual features between the normal and pathological images [13]. This paper tackles the problem with two approaches: (i) use detail preserving method for image representation, and (ii) recover small and useful information from multiple portions of full data. Most of the existing methods use distance based methods such as k-means to summarize the partial data and represent it in triplet of sufficient statistics [24][25]52], which does not capture well of the image features if they are not spherical in shape. The results show that FlexClust enhances the possibility of the detection of small details of the image by using GMM Similarity measure and domain adaptation in mixture model clustering with a scan through and selection procedure. The descriptors of local features by MLE of GMM captures features of different orientation, volume and size [20,31]. In the case of sampling based method, it always leads to unstable results [1]. Although Strategy III chooses the best model from multiple tentative best models, the trained models are still based on the same sample data. The issue of lack of representativeness of sample has not been fully addressed. FlexClustS which incorporates multiple GMM clustering from multiple samples helps to alleviate the problem. The most unique part of FlexClustS is that it allows domain adaptation, where it recovers and incorporates the cluster that has not being identified in the previous samples as illustrated in the simulation study. The proposed domain adaptation makes use of only the GMM MLEs descriptors from the source and target domains. The existing domain adaptation techniques are performed mainly by reducing the difference between the distributions of the domains [53] or discovering a good feature representation across domains [54][55]. However, there is very limited work on domain adaptation for mixture model. The choice of similarity measure is normally affected by how the image is represented and the type of descriptor used. FlexClust shows that by using MBF as a similarity measure to classify detail preserving descriptors of GMM MLEs can avoids loss of feature details. It is an improvement compared to [24][25], where their findings show that using GMM to classify descriptors (e.g. triplet of sufficient statistics) obtained from distance based clustering method (e.g. k-means) performs better than algorithms that use distance based clustering method for both classifying descriptors and producing descriptors of image representation [52]. From the results of the block based images, both FlexClustB and sufficient EM avoid the blocking artifact problem. This is the advantage of GMM based algorithm.
The results show that MBF works time effectively as a similarity measure, although relative to the other methods, FlexClustS and FlexClustB take longer computational time. However, the longer computational time is compensated by better quality image with a higher value of the SSI and better preservation of feature. It offers an alternative to medical imaging where good quality of image reconstruction is important and no loss of information can be tolerated [56]. Furthermore, it is worth noting that the second stage of FlexClust which involves the EM algorithm for multiple samples or blocks can be done independently. It leads to a substantial speed up by using parallel implementation on several processors [57].
Future work can be devoted to the generalization of the proposed algorithms to handle image with noise, and the optimal percentage of data that should be processed by FlexClustS in order to reduce the computational time in the second stage.