Elastic K-means using posterior probability

The widely used K-means clustering is a hard clustering algorithm. Here we propose a Elastic K-means clustering model (EKM) using posterior probability with soft capability where each data point can belong to multiple clusters fractionally and show the benefit of proposed Elastic K-means. Furthermore, in many applications, besides vector attributes information, pairwise relations (graph information) are also available. Thus we integrate EKM with Normalized Cut graph clustering into a single clustering formulation. Finally, we provide several useful matrix inequalities which are useful for matrix formulations of learning models. Based on these results, we prove the correctness and the convergence of EKM algorithms. Experimental results on six benchmark datasets demonstrate the effectiveness of proposed EKM and its integrated model.


Important Matrix Inequalities
To prove Algorithm 1, we need the following matrix inequalities: (A) Inequality involving quadratic form of matrix G among 2 other matrices where we used the inequality [ [1]] where P, Q ∈ ℜ K×K + , and P = P T , Q = Q T . G ,G ′ ∈ ℜ n×K + . The inequality was proved in [ [1]].
Another useful inequality involving quartic form of matrix H and 2 other matrices are nonnegative matrices; and A, B are symmetric: A slight variant of the above is where A, B ∈ ℜ n×n + . This inequality is useful when analyzing objective functions involving 4-th order matrix polynomials, such as the NMF of Eq.(1). We will use this inequality to prove the convergence of the algorithm of Eq.(S2).

Proof of the inequality Eq.(S3)
Now, switching indexes: i ⇐⇒ j, p ⇐⇒ q, r ⇐⇒ k, we obtain The 1st term in RHS of Eq.( S3 ) is Carefully examination of the RHS of Eqs.( S4 -S7 ) show that they are identical except µ 4 terms. Adding Eqs.( S4 -S7 ), we obtain that the RHS of Eq.( S3 ) is equal to Therefore, if we can establish then the inequality Eq.( S3 ) holds. For any a, b, c, d > 0, we have Thus This is Eq.(S10).

Convergence Proof of Algorithm 1
Proof. We use the auxiliary function approach Once an auxiliary function is found, we define Then, by construction, we have This proves that J(G (t) ) is monotonically decreasing. The key steps in the remainder of the proof are: (1) Finding an appropriate auxiliary function of the objective function; (2) Finding the global maxima of the auxiliary function; (3) Showing that the update rule of Eq.(S4) gives the global minima of Eq.(S14).

Auxiliary function
The objective function Eq.(7) is where A = (X T X) + ≥ 0 and B = (X T X) − ≥ 0. Now, we construct an auxiliary function Z(G, G ′ ) as the following This is done by introducing the upper bounds of the first two terms of Eq.(S16) and the lower bounds of the last two terms (ignoring the negative sign).
The upper bounds of the first terms of Eq.(S16) is and (derived from 2gg Clearly, Eq.(S18) becomes equality when G = G ′ , satisfying Eq.(S13).
The upper bound of 2nd term is The lower bound of 3rd term is where we use z ≥ 1 + log z, z > 0 and setting z = . Eq.(S21) becomes equality when G = G ′ , satisfying Eq.(S13).
The lower-bound of 4th term is where R = T r[G T AGG T G], Eq.(S22) becomes equality when G = G ′ , satisfying Eq.(S13). This completes the construction of the auxiliary function Z(G, G ′ ).