Figures
Abstract
The widely used K-means clustering is a hard clustering algorithm. Here we propose a Elastic K-means clustering model (EKM) using posterior probability with soft capability where each data point can belong to multiple clusters fractionally and show the benefit of proposed Elastic K-means. Furthermore, in many applications, besides vector attributes information, pairwise relations (graph information) are also available. Thus we integrate EKM with Normalized Cut graph clustering into a single clustering formulation. Finally, we provide several useful matrix inequalities which are useful for matrix formulations of learning models. Based on these results, we prove the correctness and the convergence of EKM algorithms. Experimental results on six benchmark datasets demonstrate the effectiveness of proposed EKM and its integrated model.
Citation: Zheng A, Jiang B, Li Y, Zhang X, Ding C (2017) Elastic K-means using posterior probability. PLoS ONE 12(12): e0188252. https://doi.org/10.1371/journal.pone.0188252
Editor: Feiping Nie, Northwestern Polytechnical University, UNITED STATES
Received: July 25, 2017; Accepted: November 5, 2017; Published: December 14, 2017
Copyright: © 2017 Zheng et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The AT&T dataset is available at:AT&T:http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html; The USPS, MNIST and BinAlpha datases are available at:http://www.cs.nyu.edu/~roweis/data.html; The COIL-20 dataset is available at: http://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php; The Isolet1 Dataset is available at: http://archive.ics.uci.edu/ml/datasets/ISOLET.
Funding: This work was funded by the Natural Science Foundation of Anhui Province (1508085QF127) and the Natural Science Foundation of Anhui Higher Education Institutions of China (KJ2016A114).
Competing interests: The authors have declared that no competing interests exist.
Introduction
Data clustering, a method of unsupervised learning and a common technique for statistical data analysis has currently been widely used in machine learning [1, 2], pattern recognition [3, 4], image analysis [5, 6] and bioinformatics [7–9]. As one of the most popular clustering methods, K-means has drawn lots of attention and been widely used for data clustering. Recent work has revealed that K-means can be represented by matrix factorization formulation, and thus can deduce many kind of K-means variants [10–15]. From the point of view of determinacy, K-means clustering approaches are hard clustering where a data point exactly belongs to one particular class. However, in many cases, this is not realistic since not all the data point distinctly belongs to one single class especially for the outliers which may have equivalent similarities to every/some classes.
On the other hand, above K-means inspired methods generally deal with the attribute information (feature vectors) of data. In real world applications, besides attribute information, we have various pairwise relations between data points which are expressed as graph data. There exist many clustering methods that utilize graph data, such as Ratio Cut based methods [16, 17] and Normalized Cut based methods [18, 19] and Min and Max Cut based methods [20, 21]. In essence, these clustering methods first embed graph nodes in low-dimensional space using linear embedding method PCA and nonlinear method IsoMAP [22–24], Local linear Embedding (LLE) [25–27], Local Tangent Space Alignment [26, 28, 29] etc, where feature vector information is utilized to obtain the final clustering results. However, It’s one-sided to just take into account of one single information.
Inspired by recent works on matrix factorization based K-means models, In this paper, we first propose a Elastic clustering model using posterior probability (named as Elastic K-means, EKM). The main observation of our Elastic K-means model is that the clustering indicator from K-means can be interpreted as the posterior cluster probabilities during the factorization procedure. The key point of the Elastic K-means is that each data point is assigned to the clusters scattered into several possible classes according to its posterior probabilities. Then we extend our EKM to graph EKM (gEKM), which simultaneously utilizes both feature vectors and pairwise relations between data points. Specifically, gEKM integrates the proposed Elastic K-means with Normalized Cut clustering. In order to evaluate the effectiveness of proposed EKM and gEKM, we implement them on six benchmark datasets. The promising experimental results demonstrate the benefit of the proposed Elastic K-means models.
Elastic K-means and related works
K-means
Given data matrix X = (x1, ⋯, xn) ∈ ℜp×n denoting n data points. The objective of K-means clustering is to find the cluster centroids ck, k = 1, ⋯ , K by minimizing the following cost function:
(1)
Our work starts with the observation that the above K-means clustering objective can be reformulated as below:
Proposition 1. The objective function Eq (1) can be rewritten as:
(2)
where ni, (i = 1, ⋯ , k) denotes the number of data points in each cluster. Note that ∥A∥ denotes the Frobenius norm of a matrix A. And G ∈ ℜn×K is the normalized cluster indicators, i.e.,
(3)
Proof of Proposition 1.
Let us introduce the standard cluster indicator
(4)
Then, Eq (1) can be equivalently written as:
(5)
where C = (c1, c2, ⋯ , cK) ∈ ℜp×K are cluster centroids. The orthogonality of G is preserved in a relaxation of class indicator matrix [21]. Now, the cluster centroids can be written as
, where
is the k-th column of
, or equivalently,
. Thus:
(6)
where
absorbs the unknown
.
Elastic K-means
Proposition 1 provides an natural way to form the Elastic K-means. First, the orthogonal constraint GT G = I ensures that one data point belongs to only one cluster as in [21, 30]. We relax this constraint, so that each data point can belong to multiple clusters. Because the formulation Eq (2) is a matrix decomposition form, the constraint GT G = I will be satisfied approximately.
Second, since the mixed signs of G may deviate the results severely from the true solution of clustering [21, 30], we relax the discrete constraint , to a simple nonnegativity constraint G ≥ 0. Thus, the final model is formulated as:
(7)
Note that the input data X has mixed signs.
We call this Elastic K-means (EKM). (1) This model is invariant w.r.t. X → βX, β ∈ ℜ is any constant. This is the same as the standard K-means. (2) Similar to K-means, the expected clustering centroids are adopted during the updating iterations. (3) different from the traditional K-means, the elastic indicator (interpreted as the posterior probability which is a fraction) performs a smooth convergency during updating. In fact, all the NMF-based methods we discussed above have K-means clustering interpretations when the factor G is orthogonal (GT G = I), which in turn means they can be regarded as the relaxations of K-means.
Related works
Our goal is to study the case where a data point is not restricted to belong to a single cluster. Three related clustering models are (A) Gaussian mixtures [31, 32], (B) Fuzzy C-means [33–35], (C) Fuzzy K-means [12, 14, 36, 37]. In general, Gaussian mixtures works only for low-dimensional data, typically d ≤ 10, therefore they are not suitable for high dimensional data. Fuzzy C-means solves
(8)
where uik is the membership of data point xi belonging to cluster k and ∑k uik = 1.
Our model differs from Fuzzy C-means on two aspects: (1) Cluster distribution of single data point xi is implicit in our model, whereas it is explicit in Fuzzy C-means. (2) The requirement that the posterior probability ∑k uik = 1 for any data point xi, but the actual distribution used in Eq (8) is , m = 2 in most cases. Thus the normalization of the actual cluster distribution is
and zi is different for different data point xi. This theoretical/conceptual drawback for Fuzzy C-means is not present in our Elastic K-means. Fuzzy K-means solves
(9)
where wik is the membership of data point xi belonging to cluster k and ∑k wik = 1. The main difference between Fuzzy K-means and our method is, the soft/fuzzy capability is achieved by exploring the corresponding weight wik (or uik in Fuzzy C-means) for each data point xi, while our Elastic K-means achieves this via the soft/fuzzy cluster indicator G, which is more explicit. Note that, the concept of soft/fuzzy K-means (or C-means)has been mentioned in a group of literatures [38–42], which also have the fuzzy/soft capability for data clustering. However, these soft clustering methods are generally derived from Fuzzy C-means or Fuzzy K-means and their variants, which are essentially different from the proposed Elastic K-means model.
Because in our model, G is nonnegative. In some sense, our model also relates to NMF. There exists a very broad category of work along NMF direction. We refer to a recent survey [43].
Computational algorithm and analysis
Algorithm
An effective updating algorithm can be derived to solve Elastic K-means problem. The algorithm iteratively updates the current solution as follows:
(10)
The pseudo codes of EKM is illustrated as Algorithm 1.
Algorithm 1: Elastic K-means Algorithm
Input: data X
1 (1) Initialize G0.
2 Construct the indicator G: Gik = 1 if xi belongs to cluster k. otherwise, Gik = 0
3 (2) Update G
4 while not converged
5
6 where
7 A = (|(XT X)ik| + (XT X)ik)/2
8 B = (|(XT X)ik| − (XT X)ik)/2
9 end
The convergence of the proposed algorithm can be found in the supplementary.
Updating algorithms for quartic models
There are many matrix models involving matrix variables to the 4th power. The simplest is:
(11)
where W = WT is symmetric. A number of papers claim that the updating algorithm for this model is
(12)
In fact, this updating algorithm does not guarrentee the monotonic decreasy of the objective function value. Using the inequality of quartic matrix of Eq (S15) in the supplementary, we can easily prove that the following updating algorithm;
(13)
which guarrentees decrease of objective, and thus the convergence.
Benefit of Elastic K-means
One main drawback of standard K-means is that each data point is assigned to a single cluster (hard clustering), but in real data, many data points are somewhere in-between different clusters centers (such as points 1, 2, 3 shown in Fig 1). One clear benefit of the proposed Elastic K-means is that these ambiguous data points are assigned into several nearby clusters, i.e., their posterior probabilities of cluster assignment are nearly evenly distributed.
x and y axes denote the first and the second dimension respectively.
Fig 1 shows the EKM results on one 2D dataset of 3 randomly generated Gaussian clusters. The EKM cluster results are indicated as red, green and blue stars respectively. Point 4, 5, 6 clearly belong to their assigned clusters. This fact is correctly encoded in the EKM model, as shown in the posterior probabilities in Fig 2: they are sharply concentrated on a single cluster.
On the other hand, points 1, 2, 3 are ambiguous: they are in-between different cluster centers. A good ‘elastic’ clustering model should assign these points to nearby clusters with nearly-equal posterior probabilities. Indeed, this desirable fact is correctly encoded in the EKM model, as shown in the posterior probabilities in Fig 2: they are nearly evenly distributed on nearby clusters.
Posterior Probability. In this and following sections, we normalize G to
(14)
so that ∑k Gik = 1 for data point xi.
A natural question is: how to identify those ambiguous data points such points 1, 2, 3 in Fig 2. For this purpose, we need to see how the posterior probability
(15)
for data point xi is distributed.
We use a useful quality to indicate approximately how sharply the distribution is peaked around the highest cluster. This is the gap in the posterior distribution: the difference between the highest and second highest peaks:
(16)
where
(17)
The idea is simple. (A) If a data point is well inside a cluster, such as points 4, 5, 6 in Fig 2, the highest peak is large and the second highest peak is small. Thus the gap is big. (B) If a data point is in-between different clusters, such as points 1, 2, 3, the difference between the highest peak and the second highest peak is small. Thus the gap is small. From these situations, the gap is useful indication of how sharply the posterior probability is distributed.
Fig 3 shows the distribution of the gap Δi of the 300 data points shown in Fig 1. Noted that, the lager Δi, the sharper that the data point belongs to a single class. Obviously, there are many ambiguous data points which can be easily detected using the gap. These studies indicate that Elastic K-means is able to detect the fuzzy/soft characteristics of data clustering.
Integrating Elastic K-means and normalized cut
EKM uses only vector (sometimes called attribute) data X. In many applications, the input data consists of both vector data X and similarity data W (also called graph data since Wij represents the similarity between data points i and j). For this situation, we can easily extend EKM to utilize the similarity data W for more effective elastic clustering.
We incorporate the similarity data through the Normalized Cut [18, 19] formalism. This extension form a single clustering formulation for input data with both vector and similarity data.
The Ncut (Normalized Cut) is defined as:
(18)
where D = diag(d1, ⋯ , dn), di = ∑j Wij.
Proposition 2. Optimization problem Eq (18) is equivalent to
(19)
with the same constraints of Eq (18), where
Proof. Because ∥S − QQT∥2 = Tr(S2 − 2QT SQ + QQT QQT), the first term and the last term are both constant due to QT Q = I. The minimization of −2QT SQ is the same as Eq (18).
In the same manner as EKM in Eq (7), we relax the orthogonal constraint QT Q = I to be nonnegative, then Eq (19) can be rewritten as:
(20)
Since Q is also the cluster indicators as G in EKM, we can integrate EKM and NCut by simplifying/approximating these constraints into G ≥ 0. The above model is now expressed in a simplified way:
(21)
We call this model as graph EKM (simplified as gEKM in the following parts) since Ncut is a popular approach for graph data clustering. It naturally incorporates both the feature/attribute data X and pairwise relations W.
Analysis
Because the model is invariant with X → βX, let β = 1/∥X∥, which implies the input X is normalized: .
The above objective can be written as:
(22)
The first 2 terms are independent of G. Thus the above optimization becomes
(23)
where X is normalized.
This expression reveals an important insight that the gEKM model is a discrete combinatorial optimization.
Algorithm 2
The optimization problem of Eq (21) can be solved by the following iterative algorithm:
(24)
where α1 = α * ∥X∥2, and A, B are defined in Algorithm 1. The proof of convergence can be similarly established as Algorithm 1.
Experiments
We perform experiments on following six benchmark datasets described as Table 1.
AT & T (http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html) face dataset contains ten different images of each 40 distinct subjects. For some subjects, the images were taken at different times, varying the lighting, facial expression and facial details (glass/no glass). All images were taken against a dark homogeneous background with the subjects in an upright, frontal, position. We stretch each image into a vector.
USPS (http://www.cs.nyu.edu/roweis/data.html) is a handwritten digit (1–10) database and we select 1000 images (100 images for every digit).
MNIST (http://www.cs.nyu.edu/roweis/data.html) is a handwritten digit database. Each image is centered (according to the center of mass of the pixel intensities) on a 2828 grid. In our experiments, we randomly choose 1000 images (i.e., each digit has 100 images). We reshape each image into one vector.
COIL20 (http://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php) contains 20 objects. Each image of the same object is taken 5 degrees apart as the object is rotated on a turntable and each object has 72 images. The size of each image is 3232 pixels, with 256 grey levels per pixel. Each image is represented by a 1024 dimensional vector.
Isolet1 (http://archive.ics.uci.edu/ml/datasets/ISOLET) dataset was generated by 150 subjects spoken the name of each letter of the alphabet twice. The speakers are grouped into sets of 30 speakers each, and are referred to as isolet1, isolet2, isolet3, isolet4, isolet5. The features include spectral coefficients, contour features, sonorant features, pre-sonorant features, and post-sonorant features. In our experiment, we utilize subset isolet1 only.
BinAlpha (http://www.cs.nyu.edu/roweis/data.html) contains 26 binary hand-written alphabets and we select 30 images for every alphabet. We stretch each image into one vector.
Clustering accuracy is used to measure the performances. Once the clustering solution is computed, the confusion matrix is calculated using the Hungarian algorithm [44] is employed to match obtained clusters with the ground truth classes. The accuracy of this matching is computed as the clustering accuracy.
In all experiments, for each dataset, we first run 20 K-means clustering with random starts. We pick the best (lowest clustering objective function value) result of these 20 runs, record the corresponding clustering accuracy as the result of K-means.
Starting from this K-means result, we run NMF [11], FCM [34], MinMaxCut [21], EKM, gEKM, and record the corresponding results. For Ncut [18], we run K-means in the eigenspace 20 times with random starts, and pick the best (lowest K-means clustering objective function value) result of these 20 runs.
Choice of parameter α
The clustering accuracy of gEKM depends on parameter α. If α > 1, the similarity part (2nd term of Eq (21)) is more important. If α < 1, the vector attribute part (first term of Eq (21)) is more important. For this reason, the proper range of choice for α is [1/10, 1/5, 1/2, 1, 2, 5, 10]. We show the clustering results against α on all six datasets in Fig 4 (here X is in full dimension).
We observe that (1) The results are not very sensitive to α. (2) α = 1 generally gives best results, which in turn means the vector attribute and the similarity part have more or less the balanced contribution. From these observations, we fix α to 1 in the following experiments.
Convergency visualization
Fig 5 visually illustrates the changes of the objective functions of Eqs (7) and (21), which demonstrates the convergency of the objective functions.
Results on clustering
We compare our methods, EKM and gEKM to K-means, NMF [11], FCM [34], Ncut [18] and MixMaxCut [21] on the six datasets. For EKM, Ncut [18] and MinMaxCut [21], we compute the graph similarity W as:
(25)
where c = 0.7 and d is the average distances between each xi and its 7 nearest neighbors. We set Wii = 0 and construct
.
Table 2 reports the clustering accuracy on the six datasets using K-means, NMF, FCM, Ncut, MinMaxCut, EKM and gEKM. One can see that (1) Ncut and MinMaxCut consistently outperform the existing NMF, FCM and Kmeans methods. (2) The proposed EKM beats Ncut and MinMaxCut in most of the cases. (3) MinMaxCut and Ncut perform nearly to proposed EKM and slightly better in some cases. (4) The integrated gEKM consistently outperforms the other six methods including MinMaxCut and Ncut which verifies the effectiveness of the integration. In order to elaborate the performance of proposed methods, we evaluate the clustering experiments in different PCA subspace with the dimension of p = 50, 100, 150, 200 on the six datasets. The obtained clustering accuracies are given in Tables 3 to 6. First of all, the results on different PCA subspaces are consistent as the full space as we discussed above. Furthermore, as shown in Fig 6, the clustering results on PCA subspace can achieve competitive or even slight better than the full space, which implies the effectiveness of subspace discovery.
where “full” on the x-axis denotes the full dimension of the datasets as indicated on Table 1.
Conclusion
In this paper, we firstly propose a Elastic K-means(EKM) framework which provides a elastic clustering solution and prove that the correctness and the convergency of the updating algorithm. Secondly, an integrated framework specified by the combination of EKM and Normalized Cut (gEKM) is proposed to take into account of both attribute data and pairwise relations and the choice of combination parameter is particularly analysed. The experimental results on six benchmark datasets demonstrate the proposed EKM leads to the better clustering accuracy than K-means as well as previous NMF-based algorithms and the popular Fuzzy C-means and Fuzzy K-means. The gEKM has been also been tested to have satisfied performance on data clustering.
References
- 1. Lechner M, Hernandez-Rosales M, Doerr D, Wieseke N, Thévenin A, Stoye J, et al. Orthology detection combining clustering and synteny for very large datasets. PLoS One. 2014;9(8):e105015. pmid:25137074
- 2. Huang J, Nie F, Huang H, Ding C. Robust Manifold Nonnegative Matrix Factorization. ACM Transactions on Knowledge Discovery from Data (TKDD). 2014;8(3):11:1–11:21.
- 3. Melin P, Castillo O. A review on type-2 fuzzy logic applications in clustering, classification and pattern recognition. Applied Soft Computing. 2014;21:568–577.
- 4. Yang H, Seoighe C. Impact of the Choice of Normalization Method on Molecular Cancer Class Discovery Using Nonnegative Matrix Factorization. PloS one. 2016;11(10):e0164880. pmid:27741311
- 5. Ji Z, Liu J, Cao G, Sun Q, Chen Qg. Robust spatially constrained fuzzy c-means algorithm for brain MR image segmentation. Pattern Recognition. 2014;47:2454–2466.
- 6. Wu S, Feng X, Zhou W. Spectral clustering of high-dimensional data exploiting sparse representation vectors. Neurocomputing. 2014;135:229–239.
- 7. Lakizadeh A, Jalili S. BiCAMWI: A Genetic-Based Biclustering Algorithm for Detecting Dynamic Protein Complexes. PloS one. 2016;11(7):e0159923. pmid:27462706
- 8. Edgardo M, Daniel T, Javier S, Carlos G, Francisco T, Alberto P. NMF-mGPU: non-negative matrix factorization on multi-GPU systems. BMC Bioinformatics. 2015;16:1–12.
- 9.
Biswas AK, Gao JX, Zhang B, Wu X. NMF-Based LncRNA-Disease Association Inference and Bi-Clustering. In: IEEE International Conference on Bioinformatics and Bioengineering (BIBE); 2014. p. 97–104.
- 10. Zhang J, OReilly KM, Perry GL, Taylor GA, Dennis TE. Extending the functionality of behavioural change-point analysis with k-means clustering: a case study with the little penguin (eudyptula minor). PloS one. 2015;10(4):e0122811. pmid:25922935
- 11. Ding C, Li T, Jordan MI. Convex and semi-nonnegative matrix factorizations. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2010;24:45–55.
- 12.
Xu J, Han J, Xiong K, Nie F. Robust and Sparse Fuzzy K-Means Clustering. In: IJCAI; 2016. p. 2224–2230.
- 13. Liu H, Wu Z, Li X, Cai D, Huang TS. Constrained Nonnegative Matrix Factorization for Image Representation. IEEE Transcations on Pattern Analysis and Machine Intelligence. 2012;34:1299–1311.
- 14. Nie F, Zeng Z, Tsang IW, Xu D, Zhang C. Spectral embedded clustering: A framework for in-sample and out-of-sample spectral clustering. IEEE Transactions on Neural Networks. 2011;22(11):1796–1808. pmid:21965198
- 15. Qiao H. New SVD based initialization strategy for Non-negative Matrix Factorization. Pattern Recognition Letters. 2015;63:71–77.
- 16. Hagen L, Kahng AB. New spectral methods for ratio cut partitioning and clustering. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 1992;11:1074–1085.
- 17. Hu Z, Pan G, Wang Y, Wu Z. Spectral Sparse Representation for Clustering: Evolved from PCA, K-means, Laplacian Eigenmap, and Ratio Cut. arXiv preprint. 2014;arXiv:1403.6290.
- 18. Shi J, Malik JP. Normalized cuts and image segmentation. IEEE Transcations on Pattern Analysis and Machine Intelligence. 2000;22:888–905.
- 19.
Yan X, Guo J, Liu S, Cheng Xq, Wang Y. Clustering short text using Ncut-weighted non-negative matrix factorization. In: CIKM’12 Proceedings of the 21st ACM international conference on Information and knowledge management; 2012. p. 2259–2262.
- 20. Neumayer S, Efrat A, Modiano E. Geographic max-flow and min-cut under a circular disk failure model. Computer Networks. 2015;77:117–127.
- 21.
Nie F, Ding C, Luo D, Huang H. Improved minmax cut graph clustering with nonnegative relaxation. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer; 2010. p. 451–466.
- 22. Tenenbaum JB, Silva Vd, C LJ. A global geometric framework for nonlinear dimensionality reduction. Science. 2000;290:2319–2323. pmid:11125149
- 23.
Bowen GJ, West J, Miller C. IsoMAP: Isoscapes Modeling, Analysis and Prediction (version 1.0). The IsoMAP Project. 2012;.
- 24. Zhao Z, Chow TWS, Zhao M. M-Isomap: Orthogonal Constrained Marginal Isomap for Nonlinear Dimensionality Reduction. IEEE Transactions on Systems Man & Cybernetics Part B Cybernetics A Publication of the IEEE Systems Man & Cybernetics Society. 2013;43(1):180–191.
- 25. Roweis ST, Saul LK. Nonlinear dimensionality reduction by locally linear embedding. Science. 2000;290(5500):2323–2326. pmid:11125150
- 26. Xiang S, Nie F, Pan C, Zhang C. Regression Reformulations of LLE and LTSA With Locally Linear Transformation. IEEE Transactions on Systems Man & Cybernetics Part B Cybernetics A Publication of the IEEE Systems Man & Cybernetics Society. 2011;41:1250–1261.
- 27.
Deng T, Deng Y, Shi Y, Zhou X. Research on Improved Locally Linear Embedding Algorithm. In: Bio-Inspired Computing—Theories and Application; 2014. p. 88–92.
- 28. Zhang Zy, Zha Hy. Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. Journal of Shanghai University (English Edition). 2004;8:406–424.
- 29.
Wang J. Local tangent space alignment. In: Geometric Structure of High-Dimensional Data and Dimensionality Reduction. Springer; 2011. p. 221–234.
- 30.
Yang Y, Shen HT, Nie F, Ji R, Zhou X. Nonnegative spectral clustering with discriminative regularization. In: AAAI; 2011. p. 2–4.
- 31. Jordan M, Xu L. On Convergence Properties of the EM Algorithm for Gaussian Mixtures. Neural Computation. 2008;8(1):129–151.
- 32. Polanski A, Marczyk M, Pietrowska M, Widlak P, Polanska J. Signal partitioning algorithm for highly efficient Gaussian mixture modeling in mass spectrometry. PloS one. 2015;10(7):e0134256. pmid:26230717
- 33. Bezdek JC, Ehrlich R, Full W. FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences. 1984;10(84):191–203.
- 34. Tang JR, Isa NAM, Chng ES. A Fuzzy-C-Means-Clustering Approach: Quantifying Chromatin Pattern of Non-Neoplastic Cervical Squamous Cells. PloS one. 2015;10(11):e0142830. pmid:26560331
- 35. Bai C, Dhavale D, Sarkis J. Complex investment decisions using rough set and fuzzy c-means: an example of investment in green supply chains. European journal of operational research. 2016;248(2):507–521.
- 36. Bezdek JC. A convergence theorem for the fuzzy ISODATA clustering algorithms. IEEE transactions on pattern analysis and machine intelligence. 1980;(1):1–8. pmid:22499617
- 37. Li MJ, Ng MK, Cheung Ym, Huang JZ. Agglomerative fuzzy k-means clustering algorithm with selection of number of clusters. IEEE transactions on knowledge and data engineering. 2008;20(11):1519–1534.
- 38. Yin X, Chen S, Hu E. Regularized soft K-means for discriminant analysis. Neurocomputing. 2013;103(3):29–42.
- 39.
Bai X, Luo S, Zhao Y. Entropy based soft K-means clustering. IEEE; 2008.
- 40.
Kim J, Shim KH, Choi S. Soft Geodesic Kernel K-Means. In: Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on; 2007. p. II-429–II-432.
- 41. Yu YC, Wang JD, Zheng GS, Jiang Y. Distributed K-means based-on Soft Constraints. Journal of Software Engineering. 2011;.
- 42. Yang T, Wang J. A Robust k-Means Type Algorithm for Soft Subspace Clustering and Its Application to Text Clustering. Journal of Software. 2014;9(8).
- 43.
Li T, Ding CH. Nonnegative Matrix Factorizations for Clustering: A Survey. In: Data Clustering: Algorithms and Applications; 2013. p. 149–176.
- 44.
Kuhn HW. The Hungarian Method for the Assignment Problem. In: 50 Years of Integer Programming 1958–2008; 2010. p. 29–47.