Interpretable nonconvex submodule clustering algorithm using ℓr-induced tensor nuclear norm and ℓ2,p column sparse norm with global convergence guarantees

Ming Yang; Shumao Han; Linglong Chen; Jiayi Wang

doi:10.1371/journal.pone.0339534

Abstract

Tensor-based subspace clustering algorithms have garnered significant attention for their high efficiency in clustering high-dimensional data. However, when dealing with 2D image data, traditional vectorization operations in most algorithms tend to undermine the correlations of higher-order tensor terms. To tackle this limitation, this paper proposes a non-convex submodule clustering approach (2D-NLRSC) that leverages sparse and low-rank representations for 2D image data. An -induced tensor nuclear norm is introduced to approximate the tensor rank precisely. Instead of vectorizing each 2D image, the framework arranges samples as lateral slices of a third-order tensor. It employs the t-product operation to generate an optimal representation tensor with low-rank constraint. The proposed method combines -norm induced clustering awareness with laplacian regularization to obtain a representation tensor with a diagonal structure. Additionally, 2D-NLRSC incorporates the -norm as a regularization term, taking advantage of its excellent invariance, continuity, and differentiability. Experimental results on real image datasets validate the superior performance of the 2D-NLRSC model.

Citation: Yang M, Han S, Chen L, Wang J (2026) Interpretable nonconvex submodule clustering algorithm using ℓ_r-induced tensor nuclear norm and ℓ_2,p column sparse norm with global convergence guarantees. PLoS One 21(1): e0339534. https://doi.org/10.1371/journal.pone.0339534

Editor: Longxiu Huang, Michigan State University, UNITED STATES OF AMERICA

Received: August 25, 2025; Accepted: December 7, 2025; Published: January 2, 2026

Copyright: © 2026 Yang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The complete source code supporting the findings of this study has been deposited in the GitHub repository at https://github.com/HSM-1-1/Code.git and is publicly available.

Funding: This work was supported by Heilongjiang Provincial Natural Science Foundation Joint Guidance Project No. JJ2024LH0820.

Competing interests: The authors have declared that no competing interests exist.

1 Introduction

High-dimensional data often resides within lower-dimensional subspaces, revealing underlying subspace structures [1]. This insight underpins subspace clustering, which assumes that data points are sampled from multiple subspaces, with each point being linearly representable by a subset of points within the same subspace. The objective of subspace clustering is to divide data into clusters, where the points within each cluster lie in the same subspace. To address these challenges, a range of subspace clustering techniques have been proposed. Traditional subspace clustering methods, for example, Sparse Subspace Clustering (SSC) [2] and Low-Rank Representation (LRR) [3], have been extensively employed to partition data into multiple, potentially overlapping linear subspaces, thereby minimizing redundancy. These techniques have proven highly effective across various applications involving high-dimensional data, such as image clustering and temporal video segmentation [3]. Among the most prominent techniques, spectral clustering-based approaches have gained substantial traction over the past decade. These approaches involve constructing an affinity matrix and applying algorithms such as K-means or Normalized Cuts (Ncut). The versatility of these algorithms in diverse clustering tasks has been a key factor in their growing popularity. Recently, several works have focused on addressing performance degradation caused by noise in multi-view clustering. For example, Zhou et al. [4] proposed the RTOSNMF method, which tackles the issues of noise interference and insufficient utilization of inter-layer relationships in multi-layer network community detection by performing denoising via linear separation and the -norm, as well as exploring inter-layer relationships through nuclear norm-constrained low-rank property. Che et al. [5] put forward a robust multi-view clustering method based on weighted low-rank tensor approximation and noise separation: on one hand, it designs differentiated constraints for different types of noise to achieve fine grained noise elimination; on the other hand, it efficiently explores high-order correlations among multiple views through weighted low-rank tensor modeling. Xie et al. [6] reformulated the min-cut problem as a bi-bounded constraint problem and developed an algorithm suitable for size-constrained min-cut scenarios and generalizable to broader bi-bounded nonlinear optimal transport problems. Shi et al. [7] proposed an optimal transport framework with both upper and lower bound constraints to enhance clustering and classification by better capturing structural relationships in data.

However, conventional subspace clustering methods encounter challenges when dealing with high-dimensional tensor data [2,3]. This challenge lies at the core of the current research. These techniques often lack robust theoretical guarantees for clustering accuracy, particularly due to the high dimensionality of the data, which requires substantial time and memory resources. Our framework addresses this gap by providing a robust, tailored solution for image data clustering.

When handling imaging data, traditional subspace clustering methods often vectorize data samples, enabling the application of algorithms designed for vectorial data. While effective in some cases, this vectorization process disrupts the intrinsic multidimensional structure of images, thereby reducing the reliability of subsequent analysis. It increases the dimensionality of the data, exacerbating the curse of dimensionality, and it also destroys the inherent high-dimensional structure of the data. To address this limitation, various works have leveraged multilinear algebra tools to preserve and exploit the spatial characteristics of imaging data [8–10]. One significant innovation in this domain is the t-product, a matrix-analogous multiplication operation for third-order tensors that improves the exploitation of their internal structure. In this framework, the representation matrices from various views are arranged into slices of a three-dimensional tensor [11–13]. Utilizing operations from multilinear and abstract algebra enables the more effective exploration of third-order tensor properties. The introduction of the t-product simplifies tensor-tensor multiplication, enabling more efficient and accurate modeling of image data [14,15].

For instance, Wu [16] proposed a tensor-based submodule clustering method for 2D imaging data, which organized samples as lateral slices of third-order tensors via t-product, integrated low-rank constraints, manifold regularization, and a unified ADMM-spectral clustering framework with nonlinear extensions. Francis et al. [17] developed a robust unsupervised tensor subspace clustering method using -regularized tensor nuclear norm (TNN) to preserve geometric structures, enable slice-wise sparse-low-rank decomposition for noise removal. Madathil et al. [18] introduced a noise-robust tensor clustering approach via reweighted nuclear norms for enhanced low-rank representation, -norm structured sparsity, and explicit noise separation, achieving high accuracy under severe corruption. Francis et al. [19] proposed a single-stage framework for incomplete imaging data, unifying tensor clustering (via sparse t-linear combinations with mode-3 low-multirank constraints) and missing-data reconstruction (through low-rank lateral slice approximations), thus preserving spatial structure.

Inspired by the above, we adopt the t-product, based on circular convolution, to model the dynamic characteristics of consecutive image sequences. Using the t-product, image data samples are grouped into a third-order tensor and represented by our proposed tensor low-rank model, built around the union of free submodules. The resulting affinity information is then utilized for final clustering.

By combining t-product operations and tensor factorization, we extend the traditional LRR clustering method to accommodate multi-view data. This paper employs sparse and low-rank representations, similar to those mentioned above. However, it incorporates the latest developments in sparse coding to propose a highly interpretable -induced tensor nuclear norm. The optimization of this norm is based on the studies of non-convex -norm by Candes et al. [20], Zuo et al. [21], and Zha et al. [22], combining sparsity and low-rank properties.

Peng et al. [23] proposed a novel NMF algorithm with an -(pseudo) norm applied to the factor matrix to enforce sparse properties. Inspired by this novel column-sparse norm and influenced by [22], we naturally generalize the above definition along the frontal slice direction to the tensor-based -induced sparse approximation, thereby proposing an -(pseudo) norm.

A dissimilarity matrix M is constructed to impose constraints on the frontal slices of the representation tensor. Specifically, elements with smaller values (indicating higher similarity) in the dissimilarity matrix are used to inversely constrain the corresponding positions in the slices of the representation tensor to take on larger values. This approach enhances the representation tensor, yielding a more distinct cluster structure. We show that the nonconvex -norm, combined with our cluster-aware construction, effectively captures the block structure of the self-representation tensor . With an appropriate value of q, the -norm offers unique advantages in sparsity representation. In certain scenarios, selecting an appropriate value of q enables the -norm to achieve unique advantages in sparsity representation.

The key contributions of our paper are as follows.

The proposed 2D-NLRSC algorithm leverages the submodular self-expression property to address the issue in traditional tensor subspace clustering where vectorization of 2D images destroys higher-order tensor correlations.It directly constructs image samples as lateral slices of a third-order tensor, effectively avoiding structural information loss caused by vectorization.
The -induced tensor nuclear norm is employed to accurately approximate the tensor rank, and combined with t-product operations to construct an optimal representation tensor under low-rank constraints, resolving the problem that traditional convex tensor nuclear norms fail to precisely characterize tensor low-rank structures.
The -norm is incorporated into 2D-NLRSC as a regularization term, utilizing its excellent invariance, continuity, and differentiability to enhance the model’s resilience to outliers while optimizing the selection of representative data points to reduce redundancy. Additionally, the clustering-aware property of the -norm is combined with Laplacian regularization to guide the representation tensor in forming block-diagonal structures consistent with clustering objectives.
An efficient alternation direction method of multipliers (ADMM) algorithm is proposed, with its convergence rigorously proven based on the Karush-Kuhn-Tucker (KKT) conditions. Experiments conducted on real-world datasets validate the effectiveness of the proposed method.

The remainder of this paper is organized as follows. Sect 2 introduces the preliminaries. Sect 3 reviews the related work. Sect 4 presents the proposed 2D-NLRSC method and its optimization procedure. Experimental results are reported in Section 5, while Sect 6 provides the convergence analysis. Finally, Sect 7 concludes the paper.

2 Notations and preliminary considerations

Here, the relevant definitions and preliminary notations for the variables are first presented. For the sake of conciseness, the related notations are summarized in Table 1.

Download:

Table 1. Notations.

https://doi.org/10.1371/journal.pone.0339534.t001

Definition 1. (t-product [10]): Let and . Then the t-product is a tensor of size , i.e.,

(1)

where is defined as

(2)

The operator and its corresponding inverse operator are defined as

(3)

Theorem 1. (t-SVD [24]): For , the t-SVD of is given by where and are orthogonal tensors. is an f-diagonal tensor, and * denotes the t-product.

Definition 2. (tensor -norm [25]) For a matrix , the -norm is defined as

(4)

where . Particularly, considering p = 1, the -norm reduces to the -norm:

(5)

We then extend matrix -norm to tensor -norm as follows

(6)

3 Related work

Consider a data set that contains N images partitioned into m categories. Conventional subspace clustering pipelines vectorize each image into a flattened vector , constructing a data matrix . These vectors are hypothesized to reside near a union of m low-dimensional subspaces embedded in . The clustering objective is to group data points based on their intrinsic subspace affiliations.

Although the above vectorization approach has demonstrated excellent performance in numerous applications, it fails to account for the spatial structure of images. Additionally, these methods, which are typically used to represent the linear combinations of samples in subspaces, are unable to capture shifted copies within submodules. In contrast, the t-product [10] offers a novel algebraic approach that generalizes matrix multiplication to third-order tensors.

3.1 Linear algebra with the t-product

To preserve the spatial structure of data, N oriented matrices of size are stacked into a third-order tensor . Here, denotes the set of tube fibers of dimension , and denotes the set of oriented matrices of size . The goal is to define a multiplication operation between tube fibers, enabling “linear” combinations of oriented matrices in which the coefficients are themselves tube fibers rather than scalar values.

Following [26], the set can be regarded as a module over the ring . From this viewpoint, the t-product provides a natural generalization of matrix multiplication to third-order tensors, where the multiplication between elements is replaced by tube fiber multiplication, and the addition remains elementwise.

3.2 Representation of high-dimensional image data

Given an image matrix , we embed it into a third-order tensor by orienting it along the third mode

(7)

where each lateral slice is a tube fiber of length D. Let denote the collection of N such oriented tensors.

In the t-product framework, is viewed as a free module of rank n₁ over the ring . Given a generating set (dictionary) , any admits a t-linear representation

(8)

For computational purposes, let and . Then the above expression can be compactly written as

(9)

Applying the discrete Fourier transform (DFT) along the third mode yields

(10)

where , , and denote the k-th frontal slices in the Fourier domain. This block-diagonal structure allows independent processing of each frequency slice, retaining spatial structure while enabling efficient computation.

By preserving the tensor structure of the original images, this representation mitigates the loss of spatial correlations caused by vectorization, and naturally supports linear modeling in the tensor algebra framework.

3.3 Submodule clustering by sparse and low-rank representation

Based on this theoretical framework, the sparse submodule clustering (SSmC) algorithm, as introduced in [26], can be expressed as the subsequent optimization problem

(11)

where denotes the representation tensor.

Inspired by the concept that images from different free submodules [27] should exhibit low correlation, Wu et al. proposed the structurally constrained low-rank submodular clustering (SCLRSmC) in [14]. The learning process for this method can be expressed as

(12)

where is a predefined weight matrix related to the data, and represents the Hadamard product.

Additionally, Wu [28] proposed an online low-rank tensor subspace clustering (OLRTSC) algorithm based on the nonconvex reformulation of tensor low-rank representation (TLRR) and the t-SVD framework, aiming to recover efficiently and cluster tensor data. This approach significantly reduces computational complexity and storage costs, handles dynamic data, and extends to scenarios with missing data. He also introduced an outlier-robust tensor low-rank representation (OR-TLRR) method, which simultaneously performs outlier detection and tensor data clustering within the t-SVD framework [29]. Research shows that t-SVD effectively reduces data dimensionality, extracts key features, and enhances data processing efficiency.

Although the above algorithms have shown promising results on real datasets, approximating tensor rank with the nuclear norm remains an imprecise problem. Since the nuclear norm treats every eigenvalue equally and penalizes noise components, it can lead to a suboptimal representation tensor. Therefore, this approach still requires improvement.

Recently, Jobin et al. [17,30] proposed a multi-view data clustering framework based on nonconvex low-rank tensor approximation

(13)

Definition 3. (-induced tensor nuclear norm): Consider a tensor, with t-SVD , then its -induced tensor nuclear norm can be expressed as

(14)

Notably, this framework employs the -induced tensor nuclear norm (TNN) as a low tensor rank constraint, which enhances the low-rank property of the representation tensor. However, Zuo et al. [21] demonstrated in the sparse coding context that is not necessarily optimal, and a general should be considered to balance rank sparsity and numerical stability. The generic -norm minimization problem takes the form

(15)

where y is the input vector (e.g., a singular value vector), and with .

Problem (15) can be efficiently solved by the generalized soft-thresholding (GST) algorithm [21], which alternates between a gradient descent step and a generalized shrinkage step. Specifically, given y, the GST update is

(16)

where is applied element-wise:

(17)

with threshold and z^* being the positive root of .

As pointed out by Zha et al. [22], minimization with r<1 can more aggressively promote sparsity (and thus low rank) compared to the convex r = 1 case, often leading to better empirical performance in subspace clustering tasks.

Inspired by the above, we introduce the -induced tensor nuclear norm to enforce low-rank structure in the tensor. Unlike [30], our method allows adjusting the value of r between 0 and 1, enabling a more flexible and effective low-rank representation of the tensor.

Definition 4. (-induced tensor nuclear norm): Consider a tensor with t-SVD . It’s -induced TNN can be defined as

(18)

where , and are the orthogonal tensors, and is an f-diagonal tensor whose frontal slices contain diagonal matrices, and denotes its Fourier transform.

Xie et al. proposed a nonconvex tensor multi-view clustering framework [25], which introduces a novel column-sparse norm instead of the squared Frobenius norm for : the - norm with . This norm exhibits properties such as invariance, continuity, and differentiability. Inspired by [25], this paper applies the - norm to 2D images.

4 Proposed method

4.1 Problem formulation and objective function

Recognizing that most conventional clustering methods rely on vectorized data representations and often overlook the intrinsic structure of image data, we focus on leveraging tensor representations, particularly for 2D images. Consequently, we propose a novel image submodule clustering method that preserves and leverages the inherent structure of image data to achieve more robust and reliable clustering outcomes.

For the 2D image samples X mentioned above, rather than vectorizing each image as in traditional subspace clustering methods, the images are arranged side by side to form a tensor . This tensor is then rotated to a dimension of , where denotes the size of the images and N indicates the number of images. Each image sample is represented using t-linear combinations. To obtain the optimal low-rank representation tensor , we apply an -induced tensor nuclear norm on . We add a sparse norm, the -norm, where , to strengthen the robustness of our approach. A slice-wise Laplacian regularization term centered around each image is also introduced to retain local details and expose nonlinear structures in high-dimensional space. This regularization guarantees consistent representations of data points within each image in the global space, enhancing overall efficiency. Therefore, the combination of sliced Laplacian, regularization, and tensor -induced TNN can be represented as follows

(19)

where , , , and is the Laplacian matrix of , the adjacency matrix of a k-nearest neighbor (KNN) graph. It can be constructed by

(20)

Here, denotes the -neighborhood of under a given distance metric . Specifically, we define , where is derived by normalizing each lateral slice of such that for all .

The degree matrix is a diagonal matrix where the i-th diagonal element is computed as . Relevant details can be found in [25]. Furthermore, the suitable block diagonal structure representing tensors is beneficial for clustering multi-view data, enhancing the algorithm’s performance. In multi-view data, objects that belong to the same submodule exhibit a strong correlation, while objects from different submodules show relatively lower correlation [14]. To enforce a diagonal structure on , we prioritize images that exhibit a lower correlation between those belonging to different submodules. This correlation between different data points can be captured by the inconsistency weighted matrix . The elements of M are defined as

(21)

where represents the tensor that is acquired by normalizing every lateral slice of in a way that for , and is typically taken as the empirical average of all .

Unlike (12), we obtain a sparser solution by substituting the -norm with the-norm. By integrating all the above, our algorithm can be summarized as follows

(22)

where , , and are balance parameters. The flowchart of the 2D-NLRSC is shown in Fig 1. Our method processes the input image data as a third-order tensor, successfully preserving the original spatial structure of images and avoiding structural information loss due to vectorization. The first term employs the -induced tensor nuclear norm to impose a low-rank constraint on the representation tensor, thereby capturing the global subspace structure of the data. The second term applies the -norm to the Hadamard product of the representation tensor and the dissimilarity weighted matrix, promoting element-wise sparsity and forming a block-diagonal structure conducive to clustering. The third term uses the -norm to limit the error tensor, boosting column-wise sparsity and model robustness against outliers and noise. The fourth term uses Laplacian regularization to utilize the local geometric structure of the data, ensuring that the representation preserves neighborhood interactions within the intrinsic manifold and so strengthens intra-cluster cohesiveness and inter-cluster separation. .

Download:

Fig 1. The framework of proposed 2D-NLRSC.

https://doi.org/10.1371/journal.pone.0339534.g001

By solving (22), the optimal representation tensor can be used to construct affinity matrices.The affinity matrix S is formulated as

(23)

Then, spectral clustering or the normalized graph cut algorithm is applied to the affinity matrix S to derive the final clustering outcomes.

4.2 Optimization

Three auxiliary variables , , are introduced into our optimization algorithm, which can be expressed as follows

(24)

The augmented Lagrangian function for this problem can be expressed as

(25)

1) Subproblem: Fixing all variables except , the update for is

(26)

Theorem 2. Let have the t-SVD decomposition . Consider the -induced tensor nuclear norm optimization problem

(27)

the optimal solution is given by

(28)

where is an f-diagonal tensor, whose diagonal elements are generated by the GST algorithm described in (16).

Applying Theorem 2, the optimal solution of (26) is

(29)

2) Subproblem: The update for is

(30)

Following the GST algorithm described in (16), the closed-form solution is

(31)

where .

3) Subproblem: The update for is

(32)

Setting the partial derivatives with respect to to zero gives

(33)

The optional solution for is derived as

(34)

4) Subproblem: The update rule for is formulated as

(35)

where , and is the v-th frontal slice of .

Let , be the symbol for the i-th column of E, and take to mean the i-th column of . The objective in (35) can be reformulated column-wise as

(36)

so that each can be solved separately. For a specific , the subproblem becomes

(37)

Here, can be treated as a special matrix, and a thin SVD can be applied. It follows that has exactly one singular value, given by

(38)

where represents the singular value of the input vector. Hence, the subproblem is equivalent to

(39)

Now, we introduce Lemma 1.

Lemma 1. [31] Consider two complex matrices A and . Let be defined as

(40)

where represents the vector consisting of the non-increasing singular values of A. If is a complex unitarily invariant function (or quasi-norm) and B has an SVD formulated as

(41)

then the optimal solution for the minimization problem

(42)

has the SVD , where

(43)

and

(44)

According to Lemma 1, the solution to (39) is expressed as

(45)

where u_i and are respectively the left and right singular vectors corresponding to , and

(46)

which can be computed using the GST algorithm in (16).

It is evident that(refer to 7.2 of [32])

(47)

denotes a thin SVD of , where [1] represents a matrix with 1 as its only element. By replacing

(48)

into (45),we can derive .

5) Subproblem: The update for is

(49)

where . Using discrete Fourier transform, the solution is

(50)

Overall, the algorithm is summed up in Algorithm 1.

Algorithm 1 The algorithm of 2D-NLRSC.

Input: Given data tensor , dissimilarity matrix , and parameters , , and .

Output: Representation tensor .

1: Initialization:

, penalty parameter ,

, , , , and t = 0.

2: while not converge do

3: Fix the others and update by (50);

4: Fix the others and update by (45);

5: Fix the others and update by (29);

6: Fix the others and update by (31);

7: Fix the others and update by (34);

8: Fix the others and update the Lagrange multipliers , ,

and by

emsp; ;

;

9: Update the parameter and by

;

10: Check the convergence conditions if

11: Output: Representation tensor .

12: else, t = t + 1;

13: end while

4.3 Complexity analysis

The computational cost of our method primarily arises from the updates of and . When updating , it is necessary to compute the FFT and inverse FFT of an tensor along mode-3, as well as perform the SVD of matrices in the Fourier domain. These operations require computations per iteration. For updating , the process involves calculating the matrix inverse and the FFT and inverse FFT transformations. The computational complexity of these operations is . Consequently, the total computational complexity of the method is approximately , where T₁ denotes the number of iterations required to solve (24) using ADMM.

5 Experiments

5.1 Datasets

Five image datasets are selected for this algorithm, and they are described in Table 2 as follows

1) ORL: The ORL dataset comprises 40 distinct subjects, each represented by 10 different images. The experimental settings align with those described in [16]. All 400 images are utilized, and each image is of a dimension of .
2) JAFFE: The JAFFE dataset comprises 213 images of 7 facial expressions from ten Japanese female subjects. Ten participants were selected for this study, each contributing their first 20 images. All 200 images are resized to . The experimental setup follows the same approach outlined in [16].
3) CMU-PIE: The CMU-PIE face dataset comprises 42,368 images of 68 subjects, exhibiting diverse poses, lighting conditions, and facial expressions. A subset of 735 images was created, containing the initial 49 images from each of the first 15 subjects. These images were subsequently resized to . The specific experiments and settings are in [16].
4) Yale: The Yale face dataset consists of 165 images of 15 individuals, each with a size of . 11 distinct facial images, exhibiting varied expressions and lighting, were provided by each participant. These images were uniformly resized to . The experimental conditions are aligned with those described in [16].
5) MNIST:The MNIST dataset contains 70,000 centered images of handwritten digits (0-9). A subset of 1,000 images was created by selecting the first 100 images of each digit. The specific experimental settings are described in [16]. For each value of , the clustering process was repeated 20 times with randomly selected categories. The average of these 20 clustering results was then used for evaluation.
6) COIL-20:The COIL-20 dataset contains 1,440 images from 20 different object categories. Each object category has 72 images captured from different angles, and the backgrounds of the objects were removed during the shooting process. These images were processed and downsampled to a size of .

Download:

Table 2. Statistics of the image datasets in the experiments.

https://doi.org/10.1371/journal.pone.0339534.t002

In our experiment, Accuracy (ACC) and Normalized Mutual Information (NMI) are utilized as performance evaluation metrics. Higher values for both ACC and NMI indicate superior clustering performance.

5.2 Compared clustering algorithms

To evaluate the performance of the algorithms proposed in this paper, we selected the state-of-the-art clustering algorithms as benchmarks for our experimental comparisons

LRR [3] is a clustering algorithm that reveals the latent structure in data by constructing a low-rank representation matrix.
SSC [2] solves a sparse optimization program to derive the sparse representation of data points from other points.
LSR [33] leverages data correlation to achieve subspace segmentation through a grouping effect that tends to cluster highly correlated data together, significantly improving segmentation accuracy while ensuring efficiency.
SC-LRR [34] extends the standard LRR by introducing a predefined weight matrix to analyze the structure of multiple disjoint subspaces, breaking through the restriction of the standard LRR that requires subspaces to be independent.
TSC [35] achieves clustering of noisy and incompletely observed high-dimensional data into a union of low-dimensional subspaces and outliers by thresholding the correlations between data points to obtain an adjacency matrix.
S3C [36] learns both the affinity matrix and segmentation results simultaneously through a joint optimization framework, expressing each data point as a structured, sparse linear combination of other data points.
KSSC [37] extends sparse subspace clustering to nonlinear manifolds via the kernel trick, enhancing clustering performance through nonlinear mappings.
SSmC [26] integrates the t-product operation into the sparse subspace clustering framework, enhancing the model’s ability to capture local features of data through its convolutional structure.
SCLRSmC [14] introduces a free submodules theory, enabling low-rank representation learning directly in the tensor space and avoiding the inherent loss of structural information associated with traditional vectorization preprocessing.
CLLRSmC [16] considers the intrinsic manifold structure of data and integrates the two distinct stages of learning a low-rank representation tensor and performing spectral clustering into a unified optimization framework.
KCLLRSmC [16] combines manifold regularization with kernel methods for manifold clustering, eliminating the need for explicit mapping of data into the feature space. It serves as a nonlinear extension of CLLRSmC.

5.3 Experiments results and analysis

The subsequent experiments use ACC and NMI as clustering evaluation metrics. Their precise definitions are provided in [16]. Each experiment is replicated 20 times, and the average results are recorded in Tables 3, 4, 5, and 6. The best results are highlighted in bold. As shown in the tables, the proposed model significantly outperforms other models regarding both ACC and NMI across the ORL, JAFFE, CMU-PIE, Yale, and MNIST datasets.

Download:

Table 3. Experimental results on ORL and JAFFE datasets.

The parameters are set as , , on ORL; , , on JAFFE.

https://doi.org/10.1371/journal.pone.0339534.t003

Download:

Table 4. Experimental results on CMU-PIE and Yale datasets.

The parameters are set as , , on CMU-PIE; , , on Yale.

https://doi.org/10.1371/journal.pone.0339534.t004

Download:

Table 5. Experimental results on MNIST dataset (L = 3 and L = 5).

The parameters are set as , , on MNIST.

https://doi.org/10.1371/journal.pone.0339534.t005

Download:

Table 6. Experimental results on MNIST dataset (L = 8 and L = 10).

The parameters are set as , , on MNIST.

https://doi.org/10.1371/journal.pone.0339534.t006

On the ORL dataset, our method outperforms all other models, achieving improvements of 0.50% in ACC and 1.50% in NMI compared to the suboptimal KCLLRSmC.

On the JAFFE dataset, as shown in Table 3, the proposed model achieves a perfect score of 100% for both ACC and NMI. In particular, our model outperforms the suboptimal KCLLRSmC model by 0.36% in ACC and 0.77% in NMI.

Experiments on the CMU-PIE dataset demonstrate that 2D-NLRSC achieves superior performance compared to existing methods, yielding the highest ACC and NMI scores. Specifically, it surpasses the suboptimal method by 2.32% in ACC and 0.54% in NMI.

On the Yale dataset, 2D-NLRSC achieved the highest ACC and NMI scores, surpassing the best-performing suboptimal method by 1.03% and 0.77%, respectively.

On the MNIST dataset, compared to all other methods, the proposed 2D-NLRSC shows significant improvements in ACC and NMI. When the number of categories L = 8, the NMI is slightly lower than those of KCLLRSmC and CLLRSmC, but outperforms other algorithms. Compared with KCLLRSmC, when L = 3, the ACC and NMI metrics are improved by 2.16% and 3.59%, respectively. Similarly, when L = 5 and L = 10, the ACC and NMI metrics are improved by {0.37%, 0.52%} and {0.08%, 0.03%}, respectively.

Experimental evaluations on the COIL-20 dataset, detailed in Fig 2, illustrate that the 2D-NLRSC method yields significant advancements in clustering performance, reflected by its elevated ACC and NMI scores.

Download:

Fig 2. Comparison of ACC and NMI for Different Algorithms On COIL-20 Dataset.

https://doi.org/10.1371/journal.pone.0339534.g002

In summary, the proposed 2D-NLRSC method significantly improves almost all datasets. These results strongly validate that 2D-NLRSC can more effectively capture high-order correlation information in samples through tensor rank approximation and joint sparse regularization based on the -norm and the -norm.

5.4 Ablation studies

To validate the role of each regularization term in the model, ablation experiments were conducted. Specifically, partial regularization terms were removed from the model 2D-NLSRC, and parameters were adjusted to achieve the optimal performance of the model. On the CMU-PIE and Yale dataset, three groups of experiments were designed, with each experiment omitting one regularization term to derive three algorithms. Through parameter optimization for each algorithm, the impact of each regularization term on the model’s performance was evaluated. The specific experimental results are shown in Fig 3.

Download:

Fig 3. Ablation study of the proposed method: Comparisons on dataset (A) CMU-PIE and dataset (B) Yale.

https://doi.org/10.1371/journal.pone.0339534.g003

According to Fig 3, the suggested 2D-NLSRC method outperforms all other ablation variants in terms of clustering performance, as measured by ACC and NMI. We observed similar performance deterioration across all incomplete configurations in systematic ablation trials with individual regularization factors removed. On the CMU-PIE dataset, the complete 2D-NLSRC model improves ACC by {0.86%, 15.42%, 0.64%, 18.19%} and NMI by {1.22%, 7.34%, 0.83%, 12.29%} over the ablation versions 2D-NLSRC(), 2D-NLSRC(), 2D-NLSRC(), and 2D-NLSRC(). The Yale dataset shows performance increases of {2.36%, 9.76%, 1.45%, 13.18%} in ACC and {1.84%, 8.59%, 0.61%, 9.02%} in NMI.

These empirical findings give significant support for the efficacy of multi-component collaboration in our approach. The performance loss found when emphasizes the necessity of the -norm regularization that works in conjunction with the inconsistency matrix M. This combination applies differentiated constraints to sample pairs from various submodules, promoting the construction of a distinct block-diagonal structure in the representation tensor. The performance drop at confirms that the -norm introduces crucial column-wise sparse constraints, effectively eliminating interference from redundant information and outliers. The performance degradation observed when demonstrates that the Laplacian regularization term effectively preserves the local geometric structure of the data, and its absence leads to disrupted neighborhood relationships that are crucial for maintaining clustering coherence The significant performance degradation in the dual-ablated scenario 2D-NLSRC () highlights the complimentary nature of these two regularization techniques, which jointly contribute to the improved clustering accuracy of 2D-NLSRC on complex datasets.

5.5 Sensitivity analysis

To comprehensively evaluate the robustness and stability of the proposed algorithm, we conduct a sensitivity analysis focusing on two key aspects: the parameter K in K-Nearest Neighbors (KNN) and the initialization of core variables ().

5.5.1 Sensitivity analysis of KNN parameter K.

We set the range of K from 5 to 30 and conduct experiments on both ORL and Yale datasets, with the specific results shown in Fig 4. The experimental data indicates that: on the ORL dataset, when K is in the range [5, 30], the fluctuation range of the algorithm’s accuracy ACC is controlled within 1.5%, and the fluctuation of normalized mutual information NMI does not exceed 1%; on the Yale dataset, the ACC fluctuation range is within 4%, and the NMI fluctuation does not exceed 3.6% . This result fully demonstrates the algorithm’s robustness to changes in K . Further analysis reveals that the relationship between K and algorithm performance is not linear. Within the tested range, the experimental results indicate that setting K=5 yields superior performance. Therefore, K is set to 5 for the subsequent experiments.

Download:

Fig 4. The performance of 2D-NLRSC on the ORL and Yale datasets under different K values.

https://doi.org/10.1371/journal.pone.0339534.g004

5.5.2 Sensitivity analysis of initialization of core variables ().

To comprehensively investigate the impact of the initialization of core variables () on the 2D-NLRSC algorithm, we design multiple sets of comparative experiments: we perform 5 repeated experiments for zero initialization and three types of random Gaussian initializations with different variances (, , ), and the final results are summarized in Table 7 after taking the average.On the ORL dataset, both ACC and NMI of zero initialization are the highest among all initialization methods; on the Yale dataset, its ACC and NMI are also significantly better than various random initializations, indicating that zero initialization can provide a better initial starting point for the algorithm, enabling the model to converge to a solution of higher quality. As the variance of the Gaussian distribution increases from 0.1 to 1, the ACC on the ORL dataset decreases from 0.8974 to 0.8785, and the NMI decreases from 0.9474 to 0.9408; on the Yale dataset, the ACC decreases from 0.9318 to 0.9273, and the NMI decreases from 0.9408 to 0.9348. It can be seen that the performance of random initialization shows a downward trend as the variance increases, and the larger the variance, the more obvious the performance degradation, indicating that the algorithm has a certain sensitivity to the variance change of random initialization.

Download:

Table 7. Performance comparison of 2D-NLRSC under different initializations on ORL and Yale datasets.

https://doi.org/10.1371/journal.pone.0339534.t007

For both the ORL dataset (with 41 iterations for all) and the Yale dataset (with 36 iterations for all), the number of convergence iterations of different initialization methods is completely consistent. This indicates that although initialization affects the quality of the solution, it has no significant interference with the convergence speed of the algorithm, reflecting the stability of the algorithm in terms of convergence efficiency. Therefore, based on the comprehensive experimental evidence that zero initialization consistently achieves superior performance on both datasets without impairing convergence speed, it is adopted as the initialization method of choice in this paper.

5.6 Parameter sensitivity

In our method, , , and serve as balancing parameters, whereas r, p, and q denote hyperparameters. We systematically investigated their impact on clustering performance (quantified by ACC and NMI) across five benchmark datasets. Through exhaustive grid search, optimal parameter configurations were empirically determined for each dataset, as presented in Tables 3–6.

For the ORL and Yale datasets, we fixed all parameters except r and p, which were varied within to isolate their effects. To analyze the joint influence of q and on Yale and ORL datasets, we set and q while keeping the other variables constant. For the JAFFE, CMU-PIE, Yale and MNIST (L = 10) datasets, is fixed at 0.001, and , were varied within to evaluate their regulatory effects. Visualizations of these parameter analyses are provided in Figs 5 and 6.

Download:

Fig 5. ACC and NMI of the 2D-NLRSC method on ORL and Yale datasets: Dependence on parameters r,p,q and

((A),(E): ORL-r,p; (B),(F): Yale-r,p; (C),(G): ORL-

; (D),(H): Yale-

).

https://doi.org/10.1371/journal.pone.0339534.g005

Download:

Fig 6. ACC and NMI of the 2D-NLRSC method on JAFFE, CMU-PIE, Yale and MNIST (L = 10) datasets: Dependence on parameters

and

((A),(E): JAFFE; (B),(F): CMU-PIE; (C),(G): Yale; (D),(H): MNIST (L = 10)).

https://doi.org/10.1371/journal.pone.0339534.g006

As shown in Fig 5, the value of r does not always correlate positively with clustering performance. On the ORL and Yale datasets, when r approaches 1, the clustering effect significantly declines, whereas setting r around 0.5 yields the best results. For parameter p, a threshold effect is evident: when p<0.7, clustering performance is constrained. As p increases, the algorithm performance improves gradually, however, when p = 1, the -norm degenerates into the -norm, which weakens sparsity constraints and degrades ACC. Based on the experimental results, the optimal range for p is [0.7,1). Parameter q exhibits a relatively stable influence, with stable and superior clustering performance observed when its value is within [0.4,0.9]. To ensure generalization ability and clustering performance of the model across different datasets, this paper selects , , for the ORL, JAFFE, CMU-PIE, Yale, MNIST and COIL-20 datasets, respectively.

As shown in Fig 6, after parameter tuning, the ACC and NMI of our algorithm exhibit similar trends across different datasets. Notably, when , the clustering performance deteriorates significantly. Similarly, as shown in Fig 5, on the Yale dataset, the performance also degrades when . Based on empirical results, we recommend setting , , and (as has a relatively stable impact).

Additionally, the running times of the 2D-NLRSC on different datasets are provided. As clearly shown in the Table 8, our algorithm exhibits significantly shorter running times than the CLIRSmc and KCLIRSmc algorithms on the ORL, JAFFE, and CMU-PIE datasets. Furthermore, our algorithm achieves higher ACC and NMI values compared to the KCLIRSmc algorithm. Overall, our algorithm demonstrates an acceptable time complexity.

Download:

Table 8. Running time (sec) comparison on different datasets.

https://doi.org/10.1371/journal.pone.0339534.t008

6 Convergence analysis

The convergence of the 2D-NLRSC is examined from two distinct perspectives in this section. First, a rigorous theoretical analysis is provided to establish the convergence properties of the proposed method. Second, the convergence behavior is demonstrated through plots of various variables, as shown in Fig 7.

Download:

Fig 7. Error convergence curves of 2D-NLRSC on different datasets ((A): ORL; (B): JAFFE; (C): CMU-PIE; (D): Yale; (E): MNIST (L = 3); (F): MNIST (L = 5)).

https://doi.org/10.1371/journal.pone.0339534.g007

Lemma 2. [31] Suppose is portrayed as , where has the SVD , , and f is differentiable. The gradient of F(X) at X is

(51)

where .

Lemma 3. [38] If the matrix H is positive definite, then

(52)

where represents the inner product, and a stands for the smallest eigenvalue of H.

Lemma 4. [38] In the finite-dimensional Euclidean space, every bounded sequence of vectors has a subsequence that converges.

Lemma 5. [39] Consider the nonconvex optimization problem with regularization

(53)

where , , and . Assume the function f satisfies the following conditions: f has L_f-Lipschitz continuous gradient, i.e.,

(54)

and f is bounded below on .

Then the following properties hold

(i) If is a local minimizer of (53), then it satisfies the first-order stationary condition(55)
where and .
(ii) Let be a stationary point with for some initial point and small . Define and let . Then each nonzero component satisfies(56)

Lemma 6. is bounded.

Proof: In order to achieve the minimization of at the -th step as illustrated in equation (26), the optimal solution must satisfy the following condition

(57)

where . The expression has a singularity near . To circumvent this, we propose an approximation for with

(58)

Let be expressed as . From Definition 4 and Lemma 2, it can be deduced that

(59)

Then we have

(60)

Therefore, it can be seen that is bounded. We can denote , where is the DFT matrix of size , and is its conjugate transpose. Given that , the following result is derived using the chain rule of matrix calculus

(61)

is bounded.

From the relations and , it follows that is bounded.

Lemma 7. is bounded.

Proof: To prove that is bounded, consider the minimization of at the -th step as illustrated in (30). The optimal solution must satisfy

(62)

From the update , it follows that

(63)

The boundedness of the norm with power q is bounded [40] implies the boundedness of .

Lemma 8. is bounded.

Proof: To achieve the minimization of at the -th step as illustrated in (32), the optimal solution requires to satisfy the following

(64)

due to the update by , then we can obtain

(65)

Based on the proof above, auxiliary variable is bounded, which implies that and its associated auxiliary variable are also bounded. According to the triangle inequality, it can be obtained that

(66)

Clearly, is bounded.

Lemma 9. is bounded.

Proof: Consider the =of at the -th step as illustrated in (35). The optimal solution must satisfy

(67)

Given , the partial derivative of with respect to is expressed as follows

(68)

Thus, the following inequality holds

(69)

From the update , this can be rewritten as

(70)

Thus, is bounded.

Theorem 3. Let be the sequence generated by Algorithm 1. Then the sequence satisfies the following two principles:

is bounded.
Any accumulation point of is a KKT point of (25).

Proof: 1) Proof of the first part of Theorem 3:

Given the following update rules

(71)

It can be deduced that

(72)

Summing both sides of Eq (72) from t = 1 to k, we have

(73)

Since the sequences and are bounded, and

(74)

It can be found that the right of (73) is finite. Thus is bounded. We observed

(75)

According to Lemma 2, we can find that the right-hand side of (75) is bounded. Since the right of (75) is nonnegative, every term is also bounded. The boundedness of means that all singular values of are bounded. Consequently, this ensures that (the sum of the squares of the singular values) is also bounded. Therefore, the sequence is bounded.

According to the Lemma 3,

(76)

where is the smallest positive eigenvalue of the positive definite matrix L. Considering the expression is bounded, we can further deduce that is bounded.

Given that for all with constant M>0 and , we prove the sequence is bounded. For , the - norm is defined as

(77)

where satisfies

(78)

Define for each column j. Then . As and p > 0, for all j,

(79)

Since preserves element values, corresponds to the sum of squares over all entries of with fixed second index i₂ = j,

(80)

Thus, for any entry ,

(81)

Set . Then for all . Hence, is bounded.

Similarly, the boundedness of implies that the sequences is also bounded. We have

(82)

because and are bounded, we deduce that is also bounded.

In conclusion, it can be proven that is bounded.

2) Proof of the second part of Theorem 3:

Above we have proved that the sequence generated by Algorithm 1 is ensured to be bounded. According to Lemma 4, which asserts that any bounded sequence in possesses a convergent subsequence, it follows that is bound to have at least a single point of accumulation. Let one of these points be denoted as as . Suppose, for the sake of generality, that .

Based on the update rule regarding , it can be concluded that . Then, by taking the limit of both sides of this equation

(83)

we obtain .

Similarly, we obtain . Based on the update rule for , it is evident that

(84)

then we can obtain

Considering the -subproblem, it is observed that

(85)

Considering the -subproblem, it can be obtained that

(86)

Considering the -subproblem, it can be obtained that

(87)

It can be seen that

(88)

Considering the -subproblem, we obtain

(89)

and it implies that

(90)

Therefore, satisfies the following KKT conditions

(91)

The KKT conditions presented below can be applied to determine the termination criteria for Algorithm 1

(92)

where stands for a predefined tolerance. As has been mentioned previously, the sequence complies with the KKT conditions of the Lagrange function (25).

Theorem 4. In our algorithm, the sequences , , , , and are Cauchy sequences and converge to their critical points.

Proof of Theorem 4: We first prove that is a Cauchy sequence. From the update rule , we can infer the following result

(93)

where .

Furthermore, this leads to the relation

(94)

where the first equality follows from , the second from Theorem 2, and the third from the unitary invariance of the Frobenius norm.

Recall the subproblem

(95)

The KKT conditions yield

(96)

where is the Lagrange multiplier.

If , it follow sthat

(97)

If and , it can be concluded that

(98)

If and , this leads to the conclusion that

(99)

Thus, in all cases, . Substituting into (94), we obtain the upper bound

(100)

Therefore,

(101)

Since is bounded by M₁ and with and , it can therefore be concluded that

(102)

The right-hand side is a convergent geometric series, so is a Cauchy sequence.

Next, we prove that is a Cauchy sequence. Using the update , we have

(103)

Since is bounded and is a Cauchy sequence, it follows that

(104)

Hence, is a Cauchy sequence.

Subsequently, we provide a formal proof that is a Cauchy sequence. Similar to the proof of Lemma 9, we derive the following inequality.

(105)

Here, M₂ is its upper bound. From the update , it implies

(106)

Then,

(107)

Utilizing the linearity property of the t-product, we can rewrite the expression as = . Within the framework of tensor t-product, the norm typically satisfies the following inequality

(108)

where denotes the spectral norm of tensor . Because is a fixed input tensor and its norm is bounded, then

(109)

Since it has been proven that is a Cauchy sequence, the right-hand side is bounded.

On the basis of (109) and is a bounded sequence, we have

(110)

Therefore, is a Cauchy sequence.

Next, we prove that is a Cauchy sequence. From the update , this yields

(111)

where .

The subproblem for is

(112)

By Lemma 5, the optimality condition is

(113)

Note that if and only if .

If , then

(114)

Since and , we have . Moreover, by Lemma 5, there exists a positive lower bound such that when , so

(115)

Let M₃ be the upper bound in (115). Then for ,

(116)

If , then , and the inequality holds trivially.

Therefore,

(117)

Since is bounded by M₄ and grows geometrically, we have

(118)

Thus, is a Cauchy sequence.

Finally, the proof that is a Cauchy sequence follows similarly from the update .

In conclusion, , , , , and are all Cauchy sequences and hence converge to the critical points of the objective function .

Additionally, by running Algorithm 1 on the four datasets above, we practically validated the convergence of the proposed 2D-NLRSC. As shown in Fig 7, the error curves quickly stabilize after a few iterations on each dataset. These two aspects collectively demonstrate that the 2D-NLRSC model exhibits stability and rapidity.

6.1 Convergence rate analysis

We complement the convergence proof by quantifying the rates for both feasibility residuals and objective/value descent under standard assumptions used by augmented Lagrangian and splitting schemes.

Notation 1. Let the penalty parameters satisfy and with a fixed . Define the primal feasibility residuals

(119)

and the data-consistency residual

(120)

Theorem 5 (Geometric decay of feasibility residuals). Suppose , , and are bounded (as established by Lemmas 6–9). Then there exist finite constants such that, for all ,

(121)

Consequently, since and with , all residuals decay R-linearly

(122)

Proof: From the optimality of the -subproblem and the update , we have

(123)

The same argument using the and updates yields the bounds for and . For , the optimality of the -subproblem together with gives

(124)

Geometric decay follows from .

Remark 1. Theorem 5 implies that the KKT termination test in (92) is met after iterations, dominated by the largest of the four residuals.

7 Conclusion

This study proposes 2D-NLRSC, a submodule clustering framework using nonconvex low-rank tensor approximation. Unlike traditional vectorization approaches, it rotates 2D images into tensors and employs the t-product to derive self-representative tensors. The key innovations involve an -induced tensor nuclear norm for optimal low-rank representation, along with a combined -norm and -norm regularization to enhance structure capture while effectively handling noise and redundancy. Theoretical convergence is demonstrated through the KKT conditions. Experiments on ORL, JAFFE, CMU-PIE, Yale, and MNIST show superior clustering accuracy and computational efficiency. Despite these advantages, this study still has certain limitations. The performance of 2D-NLRSC relies on the selection of several hyperparameters, which may require adjustment when applied to new datasets. Moreover, although relatively efficient, the computational complexity of its iterative optimization process remains relatively high for very large-scale image data. Future work will focus on automating hyperparameter selection and extending the framework to handle higher-order imaging data.

References

1. Wright J, Ma Y. High-dimensional data analysis with low-dimensional models: principles, computation, and applications. Cambridge: Cambridge University Press; 2022.
2. Elhamifar E, Vidal R. Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans Pattern Anal Mach Intell. 2013;35(11):2765–81. pmid:24051734
- View Article
- PubMed/NCBI
- Google Scholar
3. Liu G, Lin Z, Yan S, Sun J, Yu Y, Ma Y. Robust recovery of subspace structures by low-rank representation. IEEE Trans Pattern Anal Mach Intell. 2013;35(1):171–84. pmid:22487984
- View Article
- PubMed/NCBI
- Google Scholar
4. Zhou Q, Che H, Guo W, He X, Leung MF, Wen S. Robust low-rank tensor constrained orthogonal symmetric non-negative matrix factorization for multi-layer networks community detection. IEEE Transactions on Emerging Topics in Computational Intelligence.
- View Article
- Google Scholar
5. Pu X, Che H, Pan B, Leung M-F, Wen S. Robust weighted low-rank tensor approximation for multiview clustering with mixed noise. IEEE Trans Comput Soc Syst. 2024;11(3):3268–85.
- View Article
- Google Scholar
6. Xie F, Yuan J, Nie F, Li X. Dual-bounded nonlinear optimal transport for size constrained min cut clustering. arXiv preprint arXiv:250118143 2025.
- View Article
- Google Scholar
7. Shi L, Shen Z, Yan J. Double-bounded optimal transport for advanced clustering and classification. AAAI. 2024;38(13):14982–90.
- View Article
- Google Scholar
8. Braman K. Third-order tensors as linear operators on a space of matrices. Linear Algebra and its Applications. 2010;433(7):1241–53.
- View Article
- Google Scholar
9. Kilmer ME, Martin CD. Factorization strategies for third-order tensors. Linear Algebra and its Applications. 2011;435(3):641–58.
- View Article
- Google Scholar
10. Kilmer ME, Braman K, Hao N, Hoover RC. Third-order tensors as operators on matrices: a theoretical and computational framework with applications in imaging. SIAM J Matrix Anal & Appl. 2013;34(1):148–72.
- View Article
- Google Scholar
11. Li Q, Yang G. Multi-view clustering via global-view graph learning. PLoS One. 2025;20(6):e0321628. pmid:40455775
- View Article
- PubMed/NCBI
- Google Scholar
12. Zhuge W, Hou C, Jiao Y, Yue J, Tao H, Yi D. Robust auto-weighted multi-view subspace clustering with common subspace representation matrix. PLoS One. 2017;12(5):e0176769. pmid:28542234
- View Article
- PubMed/NCBI
- Google Scholar
13. Chen H, Liu X. Reweighted multi-view clustering with tissue-like P system. PLoS One. 2023;18(2):e0269878. pmid:36763648
- View Article
- PubMed/NCBI
- Google Scholar
14. Wu T, Bajwa WU. A low tensor-rank representation approach for clustering of imaging data. IEEE Signal Process Lett. 2018;25(8):1196–200.
- View Article
- Google Scholar
15. Yin M, Gao J, Xie S, Guo Y. Multiview subspace clustering via tensorial t-product representation. IEEE Trans Neural Netw Learn Syst. 2019;30(3):851–64. pmid:30059323
- View Article
- PubMed/NCBI
- Google Scholar
16. Wu T. Graph regularized low-rank representation for submodule clustering. Pattern Recognition. 2020;100:107145.
- View Article
- Google Scholar
17. Francis J, Madathil B, George SN, George S. A robust tensor-based submodule clustering for imaging data using ℓ₁₂ regularization and simultaneous noise recovery via sparse and low rank decomposition approach. J Imaging. 2021;7(12):279. pmid:34940746
- View Article
- PubMed/NCBI
- Google Scholar
18. Madathil B, George SN. Noise robust image clustering based on reweighted low rank tensor approximation and ℓ₁₂ regularization. SIViP. 2020;15(2):341–9.
- View Article
- Google Scholar
19. Francis J, M B, George SN. A unified tensor framework for clustering and simultaneous reconstruction of incomplete imaging data. ACM Trans Multimedia Comput Commun Appl. 2020;16(3):1–24.
- View Article
- Google Scholar
20. Candès EJ, Wakin MB, Boyd SP. Enhancing sparsity by reweighted ℓ₁ minimization. J Fourier Anal Appl. 2008;14(5–6):877–905.
- View Article
- Google Scholar
21. Zuo W, Meng D, Zhang L, Feng X, Zhang D. A generalized iterated shrinkage algorithm for non-convex sparse coding. In: 2013 IEEE International Conference on Computer Vision. 2013. p. 217–24. https://doi.org/10.1109/iccv.2013.34
22. Zha Z, Yuan X, Wen B, Zhou J, Zhang J, Zhu C. A benchmark for sparse coding: when group sparsity meets rank minimization. IEEE Trans Image Process. 2020:10.1109/TIP.2020.2972109. pmid:32167891
- View Article
- PubMed/NCBI
- Google Scholar
23. Peng C, Zhang Y, Chen Y, Kang Z, Chen C, Cheng Q. Log-based sparse nonnegative matrix factorization for data representation. Knowl Based Syst. 2022;251:109127. pmid:40809933
- View Article
- PubMed/NCBI
- Google Scholar
24. Semerci O, Ning Hao, Kilmer ME, Miller EL. Tensor-based formulation and nuclear norm regularization for multienergy computed tomography. IEEE Trans Image Process. 2014;23(4):1678–93. pmid:24808339
- View Article
- PubMed/NCBI
- Google Scholar
25. Xie D, Yang M, Gao Q, Song W. Non-convex tensorial multi-view clustering by integrating l1-based sliced-Laplacian regularization and l2,p-sparsity. Pattern Recognition. 2024;154:110605.
- View Article
- Google Scholar
26. Kernfeld E, Aeron S, Kilmer M. Clustering multi-way data: a novel algebraic approach. arXiv preprint 2014. https://arxiv.org/abs/1412.7056
- View Article
- Google Scholar
27. Lang S. Algebra. New York: Springer; 2005.
28. Wu T. Online tensor low-rank representation for streaming data clustering. IEEE Trans Circuits Syst Video Technol. 2023;33(2):602–17.
- View Article
- Google Scholar
29. Wu T. Robust data clustering with outliers via transformed tensor low- rank representation. In: International Conference on Artificial Intelligence and Statistics; 2024. p. 1756–64.
30. Francis J, Madathil B, George SN, George S. A nonconvex low rank and sparse constrained multiview subspace clustering via l1-induced tensor nuclear norm 2.IEEE Trans Signal Inf Process Netw. 2023.
- View Article
- Google Scholar
31. Kang Z, Peng C, Cheng J, Cheng Q. LogDet rank minimization with application to subspace clustering. Comput Intell Neurosci. 2015;2015:824289. pmid:26229527
- View Article
- PubMed/NCBI
- Google Scholar
32. Horn RA, Johnson CR. Matrix analysis. Cambridge: Cambridge University Press; 2012.
33. Lu C, Min H, Zhao ZQ, Zhu L, Huang DS, Yan S. Robust and efficient subspace segmentation via least squares regression. In: Proceedings of the European Conference on Computer Vision. 2012. p. 347–60.
34. Tang K, Liu R, Su Z, Zhang J. Structure-constrained low-rank representation. IEEE Trans Neural Netw Learn Syst. 2014;25(12):2167–79. pmid:25420240
- View Article
- PubMed/NCBI
- Google Scholar
35. Heckel R, Bolcskei H. Robust subspace clustering via thresholding. IEEE Trans Inform Theory. 2015;61(11):6320–42.
- View Article
- Google Scholar
36. Chun-Guang Li, Chong You, Vidal R. Structured sparse subspace clustering: a joint affinity learning and subspace clustering framework. IEEE Trans Image Process. 2017;26(6):2988–3001. pmid:28410106
- View Article
- PubMed/NCBI
- Google Scholar
37. Patel VM, Vidal R. Kernel sparse subspace clustering. In: 2014 IEEE International Conference on Image Processing (ICIP), 2014. 2849–53. http://dx.doi.org/10.1109/icip.2014.7025576
38. Lax PD. Linear algebra and its applications. New York: Wiley; 2007.
39. Lu Z. Iterative reweighted minimization methods for l_p l p regularized unconstrained nonlinear programming. Math Program. 2013;147(1–2):277–307.
- View Article
- Google Scholar
40. Ito K, Kunisch K. A variational approach to sparsity optimization based on Lagrange multiplier theory. Inverse Problems. 2013;30(1):015001.
- View Article
- Google Scholar

[ref1] 1. Wright J, Ma Y. High-dimensional data analysis with low-dimensional models: principles, computation, and applications. Cambridge: Cambridge University Press; 2022.

[ref2] 2. Elhamifar E, Vidal R. Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans Pattern Anal Mach Intell. 2013;35(11):2765–81. pmid:24051734
View Article
PubMed/NCBI
Google Scholar

[3] View Article

[4] PubMed/NCBI

[5] Google Scholar

[ref3] 3. Liu G, Lin Z, Yan S, Sun J, Yu Y, Ma Y. Robust recovery of subspace structures by low-rank representation. IEEE Trans Pattern Anal Mach Intell. 2013;35(1):171–84. pmid:22487984
View Article
PubMed/NCBI
Google Scholar

[7] View Article

[8] PubMed/NCBI

[9] Google Scholar

[ref4] 4. Zhou Q, Che H, Guo W, He X, Leung MF, Wen S. Robust low-rank tensor constrained orthogonal symmetric non-negative matrix factorization for multi-layer networks community detection. IEEE Transactions on Emerging Topics in Computational Intelligence.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Pu X, Che H, Pan B, Leung M-F, Wen S. Robust weighted low-rank tensor approximation for multiview clustering with mixed noise. IEEE Trans Comput Soc Syst. 2024;11(3):3268–85.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Xie F, Yuan J, Nie F, Li X. Dual-bounded nonlinear optimal transport for size constrained min cut clustering. arXiv preprint arXiv:250118143 2025.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref7] 7. Shi L, Shen Z, Yan J. Double-bounded optimal transport for advanced clustering and classification. AAAI. 2024;38(13):14982–90.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref8] 8. Braman K. Third-order tensors as linear operators on a space of matrices. Linear Algebra and its Applications. 2010;433(7):1241–53.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref9] 9. Kilmer ME, Martin CD. Factorization strategies for third-order tensors. Linear Algebra and its Applications. 2011;435(3):641–58.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref10] 10. Kilmer ME, Braman K, Hao N, Hoover RC. Third-order tensors as operators on matrices: a theoretical and computational framework with applications in imaging. SIAM J Matrix Anal & Appl. 2013;34(1):148–72.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref11] 11. Li Q, Yang G. Multi-view clustering via global-view graph learning. PLoS One. 2025;20(6):e0321628. pmid:40455775
View Article
PubMed/NCBI
Google Scholar

[32] View Article

[33] PubMed/NCBI

[34] Google Scholar

[ref12] 12. Zhuge W, Hou C, Jiao Y, Yue J, Tao H, Yi D. Robust auto-weighted multi-view subspace clustering with common subspace representation matrix. PLoS One. 2017;12(5):e0176769. pmid:28542234
View Article
PubMed/NCBI
Google Scholar

[36] View Article

[37] PubMed/NCBI

[38] Google Scholar

[ref13] 13. Chen H, Liu X. Reweighted multi-view clustering with tissue-like P system. PLoS One. 2023;18(2):e0269878. pmid:36763648
View Article
PubMed/NCBI
Google Scholar

[40] View Article

[41] PubMed/NCBI

[42] Google Scholar

[ref14] 14. Wu T, Bajwa WU. A low tensor-rank representation approach for clustering of imaging data. IEEE Signal Process Lett. 2018;25(8):1196–200.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref15] 15. Yin M, Gao J, Xie S, Guo Y. Multiview subspace clustering via tensorial t-product representation. IEEE Trans Neural Netw Learn Syst. 2019;30(3):851–64. pmid:30059323
View Article
PubMed/NCBI
Google Scholar

[47] View Article

[48] PubMed/NCBI

[49] Google Scholar

[ref16] 16. Wu T. Graph regularized low-rank representation for submodule clustering. Pattern Recognition. 2020;100:107145.
View Article
Google Scholar

[51] View Article

[52] Google Scholar

[ref17] 17. Francis J, Madathil B, George SN, George S. A robust tensor-based submodule clustering for imaging data using ℓ₁₂ regularization and simultaneous noise recovery via sparse and low rank decomposition approach. J Imaging. 2021;7(12):279. pmid:34940746
View Article
PubMed/NCBI
Google Scholar

[54] View Article

[55] PubMed/NCBI

[56] Google Scholar

[ref18] 18. Madathil B, George SN. Noise robust image clustering based on reweighted low rank tensor approximation and ℓ₁₂ regularization. SIViP. 2020;15(2):341–9.
View Article
Google Scholar

[58] View Article

[59] Google Scholar

[ref19] 19. Francis J, M B, George SN. A unified tensor framework for clustering and simultaneous reconstruction of incomplete imaging data. ACM Trans Multimedia Comput Commun Appl. 2020;16(3):1–24.
View Article
Google Scholar

[61] View Article

[62] Google Scholar

[ref20] 20. Candès EJ, Wakin MB, Boyd SP. Enhancing sparsity by reweighted ℓ₁ minimization. J Fourier Anal Appl. 2008;14(5–6):877–905.
View Article
Google Scholar

[64] View Article

[65] Google Scholar

[ref21] 21. Zuo W, Meng D, Zhang L, Feng X, Zhang D. A generalized iterated shrinkage algorithm for non-convex sparse coding. In: 2013 IEEE International Conference on Computer Vision. 2013. p. 217–24. https://doi.org/10.1109/iccv.2013.34

[ref22] 22. Zha Z, Yuan X, Wen B, Zhou J, Zhang J, Zhu C. A benchmark for sparse coding: when group sparsity meets rank minimization. IEEE Trans Image Process. 2020:10.1109/TIP.2020.2972109. pmid:32167891
View Article
PubMed/NCBI
Google Scholar

[68] View Article

[69] PubMed/NCBI

[70] Google Scholar

[ref23] 23. Peng C, Zhang Y, Chen Y, Kang Z, Chen C, Cheng Q. Log-based sparse nonnegative matrix factorization for data representation. Knowl Based Syst. 2022;251:109127. pmid:40809933
View Article
PubMed/NCBI
Google Scholar

[72] View Article

[73] PubMed/NCBI

[74] Google Scholar

[ref24] 24. Semerci O, Ning Hao, Kilmer ME, Miller EL. Tensor-based formulation and nuclear norm regularization for multienergy computed tomography. IEEE Trans Image Process. 2014;23(4):1678–93. pmid:24808339
View Article
PubMed/NCBI
Google Scholar

[76] View Article

[77] PubMed/NCBI

[78] Google Scholar

[ref25] 25. Xie D, Yang M, Gao Q, Song W. Non-convex tensorial multi-view clustering by integrating l1-based sliced-Laplacian regularization and l2,p-sparsity. Pattern Recognition. 2024;154:110605.
View Article
Google Scholar

[80] View Article

[81] Google Scholar

[ref26] 26. Kernfeld E, Aeron S, Kilmer M. Clustering multi-way data: a novel algebraic approach. arXiv preprint 2014. https://arxiv.org/abs/1412.7056
View Article
Google Scholar

[83] View Article

[84] Google Scholar

[ref27] 27. Lang S. Algebra. New York: Springer; 2005.

[ref28] 28. Wu T. Online tensor low-rank representation for streaming data clustering. IEEE Trans Circuits Syst Video Technol. 2023;33(2):602–17.
View Article
Google Scholar

[87] View Article

[88] Google Scholar

[ref29] 29. Wu T. Robust data clustering with outliers via transformed tensor low- rank representation. In: International Conference on Artificial Intelligence and Statistics; 2024. p. 1756–64.

[ref30] 30. Francis J, Madathil B, George SN, George S. A nonconvex low rank and sparse constrained multiview subspace clustering via l1-induced tensor nuclear norm 2.IEEE Trans Signal Inf Process Netw. 2023.
View Article
Google Scholar

[91] View Article

[92] Google Scholar

[ref31] 31. Kang Z, Peng C, Cheng J, Cheng Q. LogDet rank minimization with application to subspace clustering. Comput Intell Neurosci. 2015;2015:824289. pmid:26229527
View Article
PubMed/NCBI
Google Scholar

[94] View Article

[95] PubMed/NCBI

[96] Google Scholar

[ref32] 32. Horn RA, Johnson CR. Matrix analysis. Cambridge: Cambridge University Press; 2012.

[ref33] 33. Lu C, Min H, Zhao ZQ, Zhu L, Huang DS, Yan S. Robust and efficient subspace segmentation via least squares regression. In: Proceedings of the European Conference on Computer Vision. 2012. p. 347–60.

[ref34] 34. Tang K, Liu R, Su Z, Zhang J. Structure-constrained low-rank representation. IEEE Trans Neural Netw Learn Syst. 2014;25(12):2167–79. pmid:25420240
View Article
PubMed/NCBI
Google Scholar

[100] View Article

[101] PubMed/NCBI

[102] Google Scholar

[ref35] 35. Heckel R, Bolcskei H. Robust subspace clustering via thresholding. IEEE Trans Inform Theory. 2015;61(11):6320–42.
View Article
Google Scholar

[104] View Article

[105] Google Scholar

[ref36] 36. Chun-Guang Li, Chong You, Vidal R. Structured sparse subspace clustering: a joint affinity learning and subspace clustering framework. IEEE Trans Image Process. 2017;26(6):2988–3001. pmid:28410106
View Article
PubMed/NCBI
Google Scholar

[107] View Article

[108] PubMed/NCBI

[109] Google Scholar

[ref37] 37. Patel VM, Vidal R. Kernel sparse subspace clustering. In: 2014 IEEE International Conference on Image Processing (ICIP), 2014. 2849–53. http://dx.doi.org/10.1109/icip.2014.7025576

[ref38] 38. Lax PD. Linear algebra and its applications. New York: Wiley; 2007.

[ref39] 39. Lu Z. Iterative reweighted minimization methods for l_p l p regularized unconstrained nonlinear programming. Math Program. 2013;147(1–2):277–307.
View Article
Google Scholar

[113] View Article

[114] Google Scholar

[ref40] 40. Ito K, Kunisch K. A variational approach to sparsity optimization based on Lagrange multiplier theory. Inverse Problems. 2013;30(1):015001.
View Article
Google Scholar

[116] View Article

[117] Google Scholar

Interpretable nonconvex submodule clustering algorithm using ℓ_r-induced tensor nuclear norm and ℓ_2,p column sparse norm with global convergence guarantees

Interpretable nonconvex submodule clustering algorithm using ℓ_r-induced tensor nuclear norm and ℓ_2,p column sparse norm with global convergence guarantees

Figures

Abstract

1 Introduction

2 Notations and preliminary considerations

3 Related work

3.1 Linear algebra with the t-product

3.2 Representation of high-dimensional image data

3.3 Submodule clustering by sparse and low-rank representation

4 Proposed method

4.1 Problem formulation and objective function

4.2 Optimization

4.3 Complexity analysis

5 Experiments

5.1 Datasets

5.2 Compared clustering algorithms

5.3 Experiments results and analysis

5.4 Ablation studies

5.5 Sensitivity analysis

5.5.1 Sensitivity analysis of KNN parameter K.

5.5.2 Sensitivity analysis of initialization of core variables ().

5.6 Parameter sensitivity

6 Convergence analysis

6.1 Convergence rate analysis

7 Conclusion

References