Figures
Abstract
Tensor-based subspace clustering algorithms have garnered significant attention for their high efficiency in clustering high-dimensional data. However, when dealing with 2D image data, traditional vectorization operations in most algorithms tend to undermine the correlations of higher-order tensor terms. To tackle this limitation, this paper proposes a non-convex submodule clustering approach (2D-NLRSC) that leverages sparse and low-rank representations for 2D image data. An -induced tensor nuclear norm is introduced to approximate the tensor rank precisely. Instead of vectorizing each 2D image, the framework arranges samples as lateral slices of a third-order tensor. It employs the t-product operation to generate an optimal representation tensor with low-rank constraint. The proposed method combines
-norm induced clustering awareness with laplacian regularization to obtain a representation tensor with a diagonal structure. Additionally, 2D-NLRSC incorporates the
-norm as a regularization term, taking advantage of its excellent invariance, continuity, and differentiability. Experimental results on real image datasets validate the superior performance of the 2D-NLRSC model.
Citation: Yang M, Han S, Chen L, Wang J (2026) Interpretable nonconvex submodule clustering algorithm using ℓr-induced tensor nuclear norm and ℓ2,p column sparse norm with global convergence guarantees. PLoS One 21(1): e0339534. https://doi.org/10.1371/journal.pone.0339534
Editor: Longxiu Huang, Michigan State University, UNITED STATES OF AMERICA
Received: August 25, 2025; Accepted: December 7, 2025; Published: January 2, 2026
Copyright: © 2026 Yang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The complete source code supporting the findings of this study has been deposited in the GitHub repository at https://github.com/HSM-1-1/Code.git and is publicly available.
Funding: This work was supported by Heilongjiang Provincial Natural Science Foundation Joint Guidance Project No. JJ2024LH0820.
Competing interests: The authors have declared that no competing interests exist.
1 Introduction
High-dimensional data often resides within lower-dimensional subspaces, revealing underlying subspace structures [1]. This insight underpins subspace clustering, which assumes that data points are sampled from multiple subspaces, with each point being linearly representable by a subset of points within the same subspace. The objective of subspace clustering is to divide data into clusters, where the points within each cluster lie in the same subspace. To address these challenges, a range of subspace clustering techniques have been proposed. Traditional subspace clustering methods, for example, Sparse Subspace Clustering (SSC) [2] and Low-Rank Representation (LRR) [3], have been extensively employed to partition data into multiple, potentially overlapping linear subspaces, thereby minimizing redundancy. These techniques have proven highly effective across various applications involving high-dimensional data, such as image clustering and temporal video segmentation [3]. Among the most prominent techniques, spectral clustering-based approaches have gained substantial traction over the past decade. These approaches involve constructing an affinity matrix and applying algorithms such as K-means or Normalized Cuts (Ncut). The versatility of these algorithms in diverse clustering tasks has been a key factor in their growing popularity. Recently, several works have focused on addressing performance degradation caused by noise in multi-view clustering. For example, Zhou et al. [4] proposed the RTOSNMF method, which tackles the issues of noise interference and insufficient utilization of inter-layer relationships in multi-layer network community detection by performing denoising via linear separation and the -norm, as well as exploring inter-layer relationships through nuclear norm-constrained low-rank property. Che et al. [5] put forward a robust multi-view clustering method based on weighted low-rank tensor approximation and noise separation: on one hand, it designs differentiated constraints for different types of noise to achieve fine grained noise elimination; on the other hand, it efficiently explores high-order correlations among multiple views through weighted low-rank tensor modeling. Xie et al. [6] reformulated the min-cut problem as a bi-bounded constraint problem and developed an algorithm suitable for size-constrained min-cut scenarios and generalizable to broader bi-bounded nonlinear optimal transport problems. Shi et al. [7] proposed an optimal transport framework with both upper and lower bound constraints to enhance clustering and classification by better capturing structural relationships in data.
However, conventional subspace clustering methods encounter challenges when dealing with high-dimensional tensor data [2,3]. This challenge lies at the core of the current research. These techniques often lack robust theoretical guarantees for clustering accuracy, particularly due to the high dimensionality of the data, which requires substantial time and memory resources. Our framework addresses this gap by providing a robust, tailored solution for image data clustering.
When handling imaging data, traditional subspace clustering methods often vectorize data samples, enabling the application of algorithms designed for vectorial data. While effective in some cases, this vectorization process disrupts the intrinsic multidimensional structure of images, thereby reducing the reliability of subsequent analysis. It increases the dimensionality of the data, exacerbating the curse of dimensionality, and it also destroys the inherent high-dimensional structure of the data. To address this limitation, various works have leveraged multilinear algebra tools to preserve and exploit the spatial characteristics of imaging data [8–10]. One significant innovation in this domain is the t-product, a matrix-analogous multiplication operation for third-order tensors that improves the exploitation of their internal structure. In this framework, the representation matrices from various views are arranged into slices of a three-dimensional tensor [11–13]. Utilizing operations from multilinear and abstract algebra enables the more effective exploration of third-order tensor properties. The introduction of the t-product simplifies tensor-tensor multiplication, enabling more efficient and accurate modeling of image data [14,15].
For instance, Wu [16] proposed a tensor-based submodule clustering method for 2D imaging data, which organized samples as lateral slices of third-order tensors via t-product, integrated low-rank constraints, manifold regularization, and a unified ADMM-spectral clustering framework with nonlinear extensions. Francis et al. [17] developed a robust unsupervised tensor subspace clustering method using -regularized tensor nuclear norm (TNN) to preserve geometric structures, enable slice-wise sparse-low-rank decomposition for noise removal. Madathil et al. [18] introduced a noise-robust tensor clustering approach via reweighted nuclear norms for enhanced low-rank representation,
-norm structured sparsity, and explicit noise separation, achieving high accuracy under severe corruption. Francis et al. [19] proposed a single-stage framework for incomplete imaging data, unifying tensor clustering (via sparse t-linear combinations with mode-3 low-multirank constraints) and missing-data reconstruction (through low-rank lateral slice approximations), thus preserving spatial structure.
Inspired by the above, we adopt the t-product, based on circular convolution, to model the dynamic characteristics of consecutive image sequences. Using the t-product, image data samples are grouped into a third-order tensor and represented by our proposed tensor low-rank model, built around the union of free submodules. The resulting affinity information is then utilized for final clustering.
By combining t-product operations and tensor factorization, we extend the traditional LRR clustering method to accommodate multi-view data. This paper employs sparse and low-rank representations, similar to those mentioned above. However, it incorporates the latest developments in sparse coding to propose a highly interpretable -induced tensor nuclear norm. The optimization of this norm is based on the studies of non-convex
-norm by Candes et al. [20], Zuo et al. [21], and Zha et al. [22], combining sparsity and low-rank properties.
Peng et al. [23] proposed a novel NMF algorithm with an -(pseudo) norm applied to the factor matrix to enforce sparse properties. Inspired by this novel column-sparse norm and influenced by [22], we naturally generalize the above definition along the frontal slice direction to the tensor-based
-induced sparse approximation, thereby proposing an
-(pseudo) norm.
A dissimilarity matrix M is constructed to impose constraints on the frontal slices of the representation tensor. Specifically, elements with smaller values (indicating higher similarity) in the dissimilarity matrix are used to inversely constrain the corresponding positions in the slices of the representation tensor to take on larger values. This approach enhances the representation tensor, yielding a more distinct cluster structure. We show that the nonconvex -norm, combined with our cluster-aware construction, effectively captures the block structure of the self-representation tensor
. With an appropriate value of q, the
-norm offers unique advantages in sparsity representation. In certain scenarios, selecting an appropriate value of q enables the
-norm to achieve unique advantages in sparsity representation.
The key contributions of our paper are as follows.
- The proposed 2D-NLRSC algorithm leverages the submodular self-expression property to address the issue in traditional tensor subspace clustering where vectorization of 2D images destroys higher-order tensor correlations.It directly constructs image samples as lateral slices of a third-order tensor, effectively avoiding structural information loss caused by vectorization.
- The
-induced tensor nuclear norm is employed to accurately approximate the tensor rank, and combined with t-product operations to construct an optimal representation tensor under low-rank constraints, resolving the problem that traditional convex tensor nuclear norms fail to precisely characterize tensor low-rank structures.
- The
-norm is incorporated into 2D-NLRSC as a regularization term, utilizing its excellent invariance, continuity, and differentiability to enhance the model’s resilience to outliers while optimizing the selection of representative data points to reduce redundancy. Additionally, the clustering-aware property of the
-norm is combined with Laplacian regularization to guide the representation tensor in forming block-diagonal structures consistent with clustering objectives.
- An efficient alternation direction method of multipliers (ADMM) algorithm is proposed, with its convergence rigorously proven based on the Karush-Kuhn-Tucker (KKT) conditions. Experiments conducted on real-world datasets validate the effectiveness of the proposed method.
The remainder of this paper is organized as follows. Sect 2 introduces the preliminaries. Sect 3 reviews the related work. Sect 4 presents the proposed 2D-NLRSC method and its optimization procedure. Experimental results are reported in Section 5, while Sect 6 provides the convergence analysis. Finally, Sect 7 concludes the paper.
2 Notations and preliminary considerations
Here, the relevant definitions and preliminary notations for the variables are first presented. For the sake of conciseness, the related notations are summarized in Table 1.
Definition 1. (t-product [10]): Let and
. Then the t-product
is a tensor of size
, i.e.,
where is defined as
The operator and its corresponding inverse operator
are defined as
Theorem 1. (t-SVD [24]): For , the t-SVD of
is given by
where
and
are orthogonal tensors.
is an f-diagonal tensor, and * denotes the t-product.
Definition 2. (tensor -norm [25]) For a matrix
, the
-norm is defined as
where . Particularly, considering p = 1, the
-norm reduces to the
-norm:
We then extend matrix -norm to tensor
-norm as follows
3 Related work
Consider a data set that contains N images partitioned into m categories. Conventional subspace clustering pipelines vectorize each image
into a flattened vector
, constructing a data matrix
. These vectors are hypothesized to reside near a union of m low-dimensional subspaces embedded in
. The clustering objective is to group data points based on their intrinsic subspace affiliations.
Although the above vectorization approach has demonstrated excellent performance in numerous applications, it fails to account for the spatial structure of images. Additionally, these methods, which are typically used to represent the linear combinations of samples in subspaces, are unable to capture shifted copies within submodules. In contrast, the t-product [10] offers a novel algebraic approach that generalizes matrix multiplication to third-order tensors.
3.1 Linear algebra with the t-product
To preserve the spatial structure of data, N oriented matrices of size are stacked into a third-order tensor
. Here,
denotes the set of tube fibers of dimension
, and
denotes the set of oriented matrices of size
. The goal is to define a multiplication operation between tube fibers, enabling “linear” combinations of oriented matrices in which the coefficients are themselves tube fibers rather than scalar values.
Following [26], the set can be regarded as a module over the ring
. From this viewpoint, the t-product provides a natural generalization of matrix multiplication to third-order tensors, where the multiplication between elements is replaced by tube fiber multiplication, and the addition remains elementwise.
3.2 Representation of high-dimensional image data
Given an image matrix , we embed it into a third-order tensor by orienting it along the third mode
where each lateral slice is a tube fiber of length D. Let
denote the collection of N such oriented tensors.
In the t-product framework, is viewed as a free module of rank n1 over the ring
. Given a generating set (dictionary)
, any
admits a t-linear representation
For computational purposes, let and
. Then the above expression can be compactly written as
Applying the discrete Fourier transform (DFT) along the third mode yields
where ,
, and
denote the k-th frontal slices in the Fourier domain. This block-diagonal structure allows independent processing of each frequency slice, retaining spatial structure while enabling efficient computation.
By preserving the tensor structure of the original images, this representation mitigates the loss of spatial correlations caused by vectorization, and naturally supports linear modeling in the tensor algebra framework.
3.3 Submodule clustering by sparse and low-rank representation
Based on this theoretical framework, the sparse submodule clustering (SSmC) algorithm, as introduced in [26], can be expressed as the subsequent optimization problem
where denotes the representation tensor.
Inspired by the concept that images from different free submodules [27] should exhibit low correlation, Wu et al. proposed the structurally constrained low-rank submodular clustering (SCLRSmC) in [14]. The learning process for this method can be expressed as
where is a predefined weight matrix related to the data, and
represents the Hadamard product.
Additionally, Wu [28] proposed an online low-rank tensor subspace clustering (OLRTSC) algorithm based on the nonconvex reformulation of tensor low-rank representation (TLRR) and the t-SVD framework, aiming to recover efficiently and cluster tensor data. This approach significantly reduces computational complexity and storage costs, handles dynamic data, and extends to scenarios with missing data. He also introduced an outlier-robust tensor low-rank representation (OR-TLRR) method, which simultaneously performs outlier detection and tensor data clustering within the t-SVD framework [29]. Research shows that t-SVD effectively reduces data dimensionality, extracts key features, and enhances data processing efficiency.
Although the above algorithms have shown promising results on real datasets, approximating tensor rank with the nuclear norm remains an imprecise problem. Since the nuclear norm treats every eigenvalue equally and penalizes noise components, it can lead to a suboptimal representation tensor. Therefore, this approach still requires improvement.
Recently, Jobin et al. [17,30] proposed a multi-view data clustering framework based on nonconvex low-rank tensor approximation
Definition 3. (-induced tensor nuclear norm): Consider a tensor,
with t-SVD
, then its
-induced tensor nuclear norm can be expressed as
Notably, this framework employs the -induced tensor nuclear norm (TNN) as a low tensor rank constraint, which enhances the low-rank property of the representation tensor. However, Zuo et al. [21] demonstrated in the sparse coding context that
is not necessarily optimal, and a general
should be considered to balance rank sparsity and numerical stability. The generic
-norm minimization problem takes the form
where y is the input vector (e.g., a singular value vector), and with
.
Problem (15) can be efficiently solved by the generalized soft-thresholding (GST) algorithm [21], which alternates between a gradient descent step and a generalized shrinkage step. Specifically, given y, the GST update is
where is applied element-wise:
with threshold and z* being the positive root of
.
As pointed out by Zha et al. [22], minimization with r<1 can more aggressively promote sparsity (and thus low rank) compared to the convex r = 1 case, often leading to better empirical performance in subspace clustering tasks.
Inspired by the above, we introduce the -induced tensor nuclear norm to enforce low-rank structure in the tensor. Unlike [30], our method allows adjusting the value of r between 0 and 1, enabling a more flexible and effective low-rank representation of the tensor.
Definition 4. (-induced tensor nuclear norm): Consider a tensor
with t-SVD
. It’s
-induced TNN can be defined as
where ,
and
are the orthogonal tensors, and
is an f-diagonal tensor whose frontal slices contain diagonal matrices, and
denotes its Fourier transform.
Xie et al. proposed a nonconvex tensor multi-view clustering framework [25], which introduces a novel column-sparse norm instead of the squared Frobenius norm for : the
- norm with
. This norm exhibits properties such as invariance, continuity, and differentiability. Inspired by [25], this paper applies the
- norm to 2D images.
4 Proposed method
4.1 Problem formulation and objective function
Recognizing that most conventional clustering methods rely on vectorized data representations and often overlook the intrinsic structure of image data, we focus on leveraging tensor representations, particularly for 2D images. Consequently, we propose a novel image submodule clustering method that preserves and leverages the inherent structure of image data to achieve more robust and reliable clustering outcomes.
For the 2D image samples X mentioned above, rather than vectorizing each image as in traditional subspace clustering methods, the images are arranged side by side to form a tensor . This tensor is then rotated to a dimension of
, where
denotes the size of the images and N indicates the number of images. Each image sample is represented using t-linear combinations. To obtain the optimal low-rank representation tensor
, we apply an
-induced tensor nuclear norm on
. We add a sparse norm, the
-norm, where
, to strengthen the robustness of our approach. A slice-wise Laplacian regularization term centered around each image is also introduced to retain local details and expose nonlinear structures in high-dimensional space. This regularization guarantees consistent representations of data points within each image in the global space, enhancing overall efficiency. Therefore, the combination of sliced Laplacian,
regularization, and tensor
-induced TNN can be represented as follows
where ,
,
, and
is the Laplacian matrix of
, the adjacency matrix of a k-nearest neighbor (KNN) graph. It can be constructed by
Here, denotes the
-neighborhood of
under a given distance metric
. Specifically, we define
, where
is derived by normalizing each lateral slice of
such that
for all
.
The degree matrix is a diagonal matrix where the i-th diagonal element is computed as
. Relevant details can be found in [25]. Furthermore, the suitable block diagonal structure representing tensors is beneficial for clustering multi-view data, enhancing the algorithm’s performance. In multi-view data, objects that belong to the same submodule exhibit a strong correlation, while objects from different submodules show relatively lower correlation [14]. To enforce a diagonal structure on
, we prioritize images that exhibit a lower correlation between those belonging to different submodules. This correlation between different data points can be captured by the inconsistency weighted matrix
. The elements of M are defined as
where represents the tensor that is acquired by normalizing every lateral slice of
in a way that
for
, and
is typically taken as the empirical average of all
.
Unlike (12), we obtain a sparser solution by substituting the -norm with the
-norm. By integrating all the above, our algorithm can be summarized as follows
where ,
, and
are balance parameters. The flowchart of the 2D-NLRSC is shown in Fig 1. Our method processes the input image data as a third-order tensor, successfully preserving the original spatial structure of images and avoiding structural information loss due to vectorization. The first term employs the
-induced tensor nuclear norm to impose a low-rank constraint on the representation tensor, thereby capturing the global subspace structure of the data. The second term applies the
-norm to the Hadamard product of the representation tensor and the dissimilarity weighted matrix, promoting element-wise sparsity and forming a block-diagonal structure conducive to clustering. The third term uses the
-norm to limit the error tensor, boosting column-wise sparsity and model robustness against outliers and noise. The fourth term uses Laplacian regularization to utilize the local geometric structure of the data, ensuring that the representation preserves neighborhood interactions within the intrinsic manifold and so strengthens intra-cluster cohesiveness and inter-cluster separation. .
By solving (22), the optimal representation tensor can be used to construct affinity matrices.The affinity matrix S is formulated as
Then, spectral clustering or the normalized graph cut algorithm is applied to the affinity matrix S to derive the final clustering outcomes.
4.2 Optimization
Three auxiliary variables ,
,
are introduced into our optimization algorithm, which can be expressed as follows
The augmented Lagrangian function for this problem can be expressed as
1) Subproblem: Fixing all variables except
, the update for
is
Theorem 2. Let have the t-SVD decomposition
. Consider the
-induced tensor nuclear norm optimization problem
the optimal solution is given by
where is an f-diagonal tensor, whose diagonal elements are generated by the GST algorithm described in (16).
Applying Theorem 2, the optimal solution of (26) is
2) Subproblem: The update for
is
Following the GST algorithm described in (16), the closed-form solution is
where .
3) Subproblem: The update for
is
Setting the partial derivatives with respect to to zero gives
The optional solution for is derived as
4) Subproblem: The update rule for
is formulated as
where , and
is the v-th frontal slice of
.
Let ,
be the symbol for the i-th column of E, and take
to mean the i-th column of
. The objective in (35) can be reformulated column-wise as
so that each can be solved separately. For a specific
, the subproblem becomes
Here, can be treated as a special matrix, and a thin SVD can be applied. It follows that
has exactly one singular value, given by
where represents the singular value of the input vector. Hence, the subproblem is equivalent to
Now, we introduce Lemma 1.
Lemma 1. [31] Consider two complex matrices A and . Let
be defined as
where represents the vector consisting of the non-increasing singular values of A. If
is a complex unitarily invariant function (or quasi-norm) and B has an SVD formulated as
then the optimal solution for the minimization problem
has the SVD , where
and
According to Lemma 1, the solution to (39) is expressed as
where ui and are respectively the left and right singular vectors corresponding to
, and
which can be computed using the GST algorithm in (16).
It is evident that(refer to 7.2 of [32])
denotes a thin SVD of , where [1] represents a matrix with 1 as its only element. By replacing
into (45),we can derive
.
5) Subproblem: The update for
is
where . Using discrete Fourier transform, the solution is
Overall, the algorithm is summed up in Algorithm 1.
Algorithm 1 The algorithm of 2D-NLRSC.
Input: Given data tensor , dissimilarity matrix
, and parameters
,
, and
.
Output: Representation tensor .
1: Initialization:
, penalty parameter
,
,
,
,
, and t = 0.
2: while not converge do
3: Fix the others and update by (50);
4: Fix the others and update by (45);
5: Fix the others and update by (29);
6: Fix the others and update by (31);
7: Fix the others and update by (34);
8: Fix the others and update the Lagrange multipliers ,
,
and
by
emsp; ;
;
;
;
9: Update the parameter and
by
;
;
10: Check the convergence conditions if
11: Output: Representation tensor .
12: else, t = t + 1;
13: end while
4.3 Complexity analysis
The computational cost of our method primarily arises from the updates of and
. When updating
, it is necessary to compute the FFT and inverse FFT of an
tensor along mode-3, as well as perform the SVD of
matrices in the Fourier domain. These operations require
computations per iteration. For updating
, the process involves calculating the matrix inverse and the FFT and inverse FFT transformations. The computational complexity of these operations is
. Consequently, the total computational complexity of the method is approximately
, where T1 denotes the number of iterations required to solve (24) using ADMM.
5 Experiments
5.1 Datasets
Five image datasets are selected for this algorithm, and they are described in Table 2 as follows
- 1) ORL: The ORL dataset comprises 40 distinct subjects, each represented by 10 different images. The experimental settings align with those described in [16]. All 400 images are utilized, and each image is of a dimension of
.
- 2) JAFFE: The JAFFE dataset comprises 213 images of 7 facial expressions from ten Japanese female subjects. Ten participants were selected for this study, each contributing their first 20 images. All 200 images are resized to
. The experimental setup follows the same approach outlined in [16].
- 3) CMU-PIE: The CMU-PIE face dataset comprises 42,368 images of 68 subjects, exhibiting diverse poses, lighting conditions, and facial expressions. A subset of 735 images was created, containing the initial 49 images from each of the first 15 subjects. These images were subsequently resized to
. The specific experiments and settings are in [16].
- 4) Yale: The Yale face dataset consists of 165 images of 15 individuals, each with a size of
. 11 distinct facial images, exhibiting varied expressions and lighting, were provided by each participant. These images were uniformly resized to
. The experimental conditions are aligned with those described in [16].
- 5) MNIST:The MNIST dataset contains 70,000 centered
images of handwritten digits (0-9). A subset of 1,000 images was created by selecting the first 100 images of each digit. The specific experimental settings are described in [16]. For each value of
, the clustering process was repeated 20 times with randomly selected categories. The average of these 20 clustering results was then used for evaluation.
- 6) COIL-20:The COIL-20 dataset contains 1,440 images from 20 different object categories. Each object category has 72 images captured from different angles, and the backgrounds of the objects were removed during the shooting process. These images were processed and downsampled to a size of
.
In our experiment, Accuracy (ACC) and Normalized Mutual Information (NMI) are utilized as performance evaluation metrics. Higher values for both ACC and NMI indicate superior clustering performance.
5.2 Compared clustering algorithms
To evaluate the performance of the algorithms proposed in this paper, we selected the state-of-the-art clustering algorithms as benchmarks for our experimental comparisons
- LRR [3] is a clustering algorithm that reveals the latent structure in data by constructing a low-rank representation matrix.
- SSC [2] solves a sparse optimization program to derive the sparse representation of data points from other points.
- LSR [33] leverages data correlation to achieve subspace segmentation through a grouping effect that tends to cluster highly correlated data together, significantly improving segmentation accuracy while ensuring efficiency.
- SC-LRR [34] extends the standard LRR by introducing a predefined weight matrix to analyze the structure of multiple disjoint subspaces, breaking through the restriction of the standard LRR that requires subspaces to be independent.
- TSC [35] achieves clustering of noisy and incompletely observed high-dimensional data into a union of low-dimensional subspaces and outliers by thresholding the correlations between data points to obtain an adjacency matrix.
- S3C [36] learns both the affinity matrix and segmentation results simultaneously through a joint optimization framework, expressing each data point as a structured, sparse linear combination of other data points.
- KSSC [37] extends sparse subspace clustering to nonlinear manifolds via the kernel trick, enhancing clustering performance through nonlinear mappings.
- SSmC [26] integrates the t-product operation into the sparse subspace clustering framework, enhancing the model’s ability to capture local features of data through its convolutional structure.
- SCLRSmC [14] introduces a free submodules theory, enabling low-rank representation learning directly in the tensor space and avoiding the inherent loss of structural information associated with traditional vectorization preprocessing.
- CLLRSmC [16] considers the intrinsic manifold structure of data and integrates the two distinct stages of learning a low-rank representation tensor and performing spectral clustering into a unified optimization framework.
- KCLLRSmC [16] combines manifold regularization with kernel methods for manifold clustering, eliminating the need for explicit mapping of data into the feature space. It serves as a nonlinear extension of CLLRSmC.
5.3 Experiments results and analysis
The subsequent experiments use ACC and NMI as clustering evaluation metrics. Their precise definitions are provided in [16]. Each experiment is replicated 20 times, and the average results are recorded in Tables 3, 4, 5, and 6. The best results are highlighted in bold. As shown in the tables, the proposed model significantly outperforms other models regarding both ACC and NMI across the ORL, JAFFE, CMU-PIE, Yale, and MNIST datasets.
The parameters are set as ,
,
on ORL;
,
,
on JAFFE.
The parameters are set as ,
,
on CMU-PIE;
,
,
on Yale.
The parameters are set as ,
,
on MNIST.
The parameters are set as ,
,
on MNIST.
On the ORL dataset, our method outperforms all other models, achieving improvements of 0.50% in ACC and 1.50% in NMI compared to the suboptimal KCLLRSmC.
On the JAFFE dataset, as shown in Table 3, the proposed model achieves a perfect score of 100% for both ACC and NMI. In particular, our model outperforms the suboptimal KCLLRSmC model by 0.36% in ACC and 0.77% in NMI.
Experiments on the CMU-PIE dataset demonstrate that 2D-NLRSC achieves superior performance compared to existing methods, yielding the highest ACC and NMI scores. Specifically, it surpasses the suboptimal method by 2.32% in ACC and 0.54% in NMI.
On the Yale dataset, 2D-NLRSC achieved the highest ACC and NMI scores, surpassing the best-performing suboptimal method by 1.03% and 0.77%, respectively.
On the MNIST dataset, compared to all other methods, the proposed 2D-NLRSC shows significant improvements in ACC and NMI. When the number of categories L = 8, the NMI is slightly lower than those of KCLLRSmC and CLLRSmC, but outperforms other algorithms. Compared with KCLLRSmC, when L = 3, the ACC and NMI metrics are improved by 2.16% and 3.59%, respectively. Similarly, when L = 5 and L = 10, the ACC and NMI metrics are improved by {0.37%, 0.52%} and {0.08%, 0.03%}, respectively.
Experimental evaluations on the COIL-20 dataset, detailed in Fig 2, illustrate that the 2D-NLRSC method yields significant advancements in clustering performance, reflected by its elevated ACC and NMI scores.
In summary, the proposed 2D-NLRSC method significantly improves almost all datasets. These results strongly validate that 2D-NLRSC can more effectively capture high-order correlation information in samples through tensor rank approximation and joint sparse regularization based on the -norm and the
-norm.
5.4 Ablation studies
To validate the role of each regularization term in the model, ablation experiments were conducted. Specifically, partial regularization terms were removed from the model 2D-NLSRC, and parameters were adjusted to achieve the optimal performance of the model. On the CMU-PIE and Yale dataset, three groups of experiments were designed, with each experiment omitting one regularization term to derive three algorithms. Through parameter optimization for each algorithm, the impact of each regularization term on the model’s performance was evaluated. The specific experimental results are shown in Fig 3.
According to Fig 3, the suggested 2D-NLSRC method outperforms all other ablation variants in terms of clustering performance, as measured by ACC and NMI. We observed similar performance deterioration across all incomplete configurations in systematic ablation trials with individual regularization factors removed. On the CMU-PIE dataset, the complete 2D-NLSRC model improves ACC by {0.86%, 15.42%, 0.64%, 18.19%} and NMI by {1.22%, 7.34%, 0.83%, 12.29%} over the ablation versions 2D-NLSRC(), 2D-NLSRC(
), 2D-NLSRC(
), and 2D-NLSRC(
). The Yale dataset shows performance increases of {2.36%, 9.76%, 1.45%, 13.18%} in ACC and {1.84%, 8.59%, 0.61%, 9.02%} in NMI.
These empirical findings give significant support for the efficacy of multi-component collaboration in our approach. The performance loss found when emphasizes the necessity of the
-norm regularization that works in conjunction with the inconsistency matrix M. This combination applies differentiated constraints to sample pairs from various submodules, promoting the construction of a distinct block-diagonal structure in the representation tensor. The performance drop at
confirms that the
-norm introduces crucial column-wise sparse constraints, effectively eliminating interference from redundant information and outliers. The performance degradation observed when
demonstrates that the Laplacian regularization term effectively preserves the local geometric structure of the data, and its absence leads to disrupted neighborhood relationships that are crucial for maintaining clustering coherence The significant performance degradation in the dual-ablated scenario 2D-NLSRC (
) highlights the complimentary nature of these two regularization techniques, which jointly contribute to the improved clustering accuracy of 2D-NLSRC on complex datasets.
5.5 Sensitivity analysis
To comprehensively evaluate the robustness and stability of the proposed algorithm, we conduct a sensitivity analysis focusing on two key aspects: the parameter K in K-Nearest Neighbors (KNN) and the initialization of core variables ().
5.5.1 Sensitivity analysis of KNN parameter K.
We set the range of K from 5 to 30 and conduct experiments on both ORL and Yale datasets, with the specific results shown in Fig 4. The experimental data indicates that: on the ORL dataset, when K is in the range [5, 30], the fluctuation range of the algorithm’s accuracy ACC is controlled within 1.5%, and the fluctuation of normalized mutual information NMI does not exceed 1%; on the Yale dataset, the ACC fluctuation range is within 4%, and the NMI fluctuation does not exceed 3.6% . This result fully demonstrates the algorithm’s robustness to changes in K . Further analysis reveals that the relationship between K and algorithm performance is not linear. Within the tested range, the experimental results indicate that setting K=5 yields superior performance. Therefore, K is set to 5 for the subsequent experiments.
5.5.2 Sensitivity analysis of initialization of core variables (
).
To comprehensively investigate the impact of the initialization of core variables () on the 2D-NLRSC algorithm, we design multiple sets of comparative experiments: we perform 5 repeated experiments for zero initialization and three types of random Gaussian initializations with different variances (
,
,
), and the final results are summarized in Table 7 after taking the average.On the ORL dataset, both ACC and NMI of zero initialization are the highest among all initialization methods; on the Yale dataset, its ACC and NMI are also significantly better than various random initializations, indicating that zero initialization can provide a better initial starting point for the algorithm, enabling the model to converge to a solution of higher quality. As the variance of the Gaussian distribution increases from 0.1 to 1, the ACC on the ORL dataset decreases from 0.8974 to 0.8785, and the NMI decreases from 0.9474 to 0.9408; on the Yale dataset, the ACC decreases from 0.9318 to 0.9273, and the NMI decreases from 0.9408 to 0.9348. It can be seen that the performance of random initialization shows a downward trend as the variance increases, and the larger the variance, the more obvious the performance degradation, indicating that the algorithm has a certain sensitivity to the variance change of random initialization.
For both the ORL dataset (with 41 iterations for all) and the Yale dataset (with 36 iterations for all), the number of convergence iterations of different initialization methods is completely consistent. This indicates that although initialization affects the quality of the solution, it has no significant interference with the convergence speed of the algorithm, reflecting the stability of the algorithm in terms of convergence efficiency. Therefore, based on the comprehensive experimental evidence that zero initialization consistently achieves superior performance on both datasets without impairing convergence speed, it is adopted as the initialization method of choice in this paper.
5.6 Parameter sensitivity
In our method, ,
, and
serve as balancing parameters, whereas r, p, and q denote hyperparameters. We systematically investigated their impact on clustering performance (quantified by ACC and NMI) across five benchmark datasets. Through exhaustive grid search, optimal parameter configurations were empirically determined for each dataset, as presented in Tables 3–6.
For the ORL and Yale datasets, we fixed all parameters except r and p, which were varied within to isolate their effects. To analyze the joint influence of q and
on Yale and ORL datasets, we set
and q
while keeping the other variables constant. For the JAFFE, CMU-PIE, Yale and MNIST (L = 10) datasets,
is fixed at 0.001, and
,
were varied within
to evaluate their regulatory effects. Visualizations of these parameter analyses are provided in Figs 5 and 6.
As shown in Fig 5, the value of r does not always correlate positively with clustering performance. On the ORL and Yale datasets, when r approaches 1, the clustering effect significantly declines, whereas setting r around 0.5 yields the best results. For parameter p, a threshold effect is evident: when p<0.7, clustering performance is constrained. As p increases, the algorithm performance improves gradually, however, when p = 1, the -norm degenerates into the
-norm, which weakens sparsity constraints and degrades ACC. Based on the experimental results, the optimal range for p is [0.7,1). Parameter q exhibits a relatively stable influence, with stable and superior clustering performance observed when its value is within [0.4,0.9]. To ensure generalization ability and clustering performance of the model across different datasets, this paper selects
,
,
for the ORL, JAFFE, CMU-PIE, Yale, MNIST and COIL-20 datasets, respectively.
As shown in Fig 6, after parameter tuning, the ACC and NMI of our algorithm exhibit similar trends across different datasets. Notably, when , the clustering performance deteriorates significantly. Similarly, as shown in Fig 5, on the Yale dataset, the performance also degrades when
. Based on empirical results, we recommend setting
,
, and
(as
has a relatively stable impact).
Additionally, the running times of the 2D-NLRSC on different datasets are provided. As clearly shown in the Table 8, our algorithm exhibits significantly shorter running times than the CLIRSmc and KCLIRSmc algorithms on the ORL, JAFFE, and CMU-PIE datasets. Furthermore, our algorithm achieves higher ACC and NMI values compared to the KCLIRSmc algorithm. Overall, our algorithm demonstrates an acceptable time complexity.
6 Convergence analysis
The convergence of the 2D-NLRSC is examined from two distinct perspectives in this section. First, a rigorous theoretical analysis is provided to establish the convergence properties of the proposed method. Second, the convergence behavior is demonstrated through plots of various variables, as shown in Fig 7.
Lemma 2. [31] Suppose is portrayed as
, where
has the SVD
,
, and f is differentiable. The gradient of F(X) at X is
where .
Lemma 3. [38] If the matrix H is positive definite, then
where represents the inner product, and a stands for the smallest eigenvalue of H.
Lemma 4. [38] In the finite-dimensional Euclidean space, every bounded sequence of vectors has a subsequence that converges.
Lemma 5. [39] Consider the nonconvex optimization problem with regularization
where ,
, and
. Assume the function f satisfies the following conditions: f has Lf-Lipschitz continuous gradient, i.e.,
and f is bounded below on .
Then the following properties hold
- (i) If
is a local minimizer of (53), then it satisfies the first-order stationary condition
(55)
whereand
.
- (ii) Let
be a stationary point with
for some initial point
and small
. Define
and let
. Then each nonzero component satisfies
(56)
Lemma 6. is bounded.
Proof: In order to achieve the minimization of at the
-th step as illustrated in equation (26), the optimal solution
must satisfy the following condition
where . The expression
has a singularity near
. To circumvent this, we propose an approximation for
with
Let be expressed as
. From Definition 4 and Lemma 2, it can be deduced that
Then we have
Therefore, it can be seen that is bounded. We can denote
, where
is the DFT matrix of size
, and
is its conjugate transpose. Given that
, the following result is derived using the chain rule of matrix calculus
is bounded.
From the relations and
, it follows that
is bounded.
Lemma 7. is bounded.
Proof: To prove that is bounded, consider the minimization of
at the
-th step as illustrated in (30). The optimal solution
must satisfy
From the update , it follows that
The boundedness of the norm with power q is bounded [40] implies the boundedness of
.
Lemma 8. is bounded.
Proof: To achieve the minimization of at the
-th step as illustrated in (32), the optimal solution
requires to satisfy the following
due to the update by
, then we can obtain
Based on the proof above, auxiliary variable is bounded, which implies that
and its associated auxiliary variable
are also bounded. According to the triangle inequality, it can be obtained that
Clearly, is bounded.
Lemma 9. is bounded.
Proof: Consider the =of at the
-th step as illustrated in (35). The optimal solution
must satisfy
Given , the partial derivative of
with respect to
is expressed as follows
Thus, the following inequality holds
From the update , this can be rewritten as
Thus, is bounded.
Theorem 3. Let be the sequence generated by Algorithm 1. Then the sequence
satisfies the following two principles:
is bounded.
- Any accumulation point of
is a KKT point of (25).
Proof: 1) Proof of the first part of Theorem 3:
Given the following update rules
It can be deduced that
Summing both sides of Eq (72) from t = 1 to k, we have
Since the sequences and
are bounded, and
It can be found that the right of (73) is finite. Thus is bounded. We observed
According to Lemma 2, we can find that the right-hand side of (75) is bounded. Since the right of (75) is nonnegative, every term is also bounded. The boundedness of means that all singular values of
are bounded. Consequently, this ensures that
(the sum of the squares of the singular values) is also bounded. Therefore, the sequence
is bounded.
According to the Lemma 3,
where is the smallest positive eigenvalue of the positive definite matrix L. Considering the expression
is bounded, we can further deduce that
is bounded.
Given that for all
with constant M>0 and
, we prove the sequence
is bounded. For
, the
- norm is defined as
where satisfies
Define for each column j. Then
. As
and p > 0, for all j,
Since preserves element values,
corresponds to the sum of squares over all entries of
with fixed second index i2 = j,
Thus, for any entry ,
Set . Then
for all
. Hence,
is bounded.
Similarly, the boundedness of implies that the sequences
is also bounded. We have
because and
are bounded, we deduce that
is also bounded.
In conclusion, it can be proven that is bounded.
2) Proof of the second part of Theorem 3:
Above we have proved that the sequence generated by Algorithm 1 is ensured to be bounded. According to Lemma 4, which asserts that any bounded sequence in
possesses a convergent subsequence, it follows that
is bound to have at least a single point of accumulation. Let one of these points be denoted as as
. Suppose, for the sake of generality, that
.
Based on the update rule regarding , it can be concluded that
. Then, by taking the limit of both sides of this equation
we obtain .
Similarly, we obtain . Based on the update rule for
, it is evident that
then we can obtain
Considering the -subproblem, it is observed that
Considering the -subproblem, it can be obtained that
Considering the -subproblem, it can be obtained that
It can be seen that
Considering the -subproblem, we obtain
and it implies that
Therefore, satisfies the following KKT conditions
The KKT conditions presented below can be applied to determine the termination criteria for Algorithm 1
where stands for a predefined tolerance. As has been mentioned previously, the sequence complies with the KKT conditions of the Lagrange function (25).
Theorem 4. In our algorithm, the sequences ,
,
,
, and
are Cauchy sequences and converge to their critical points.
Proof of Theorem 4: We first prove that is a Cauchy sequence. From the update rule
, we can infer the following result
where .
Furthermore, this leads to the relation
where the first equality follows from , the second from Theorem 2, and the third from the unitary invariance of the Frobenius norm.
Recall the subproblem
The KKT conditions yield
where is the Lagrange multiplier.
If , it follow sthat
If and
, it can be concluded that
If and
, this leads to the conclusion that
Thus, in all cases, . Substituting into (94), we obtain the upper bound
Therefore,
Since is bounded by M1 and
with
and
, it can therefore be concluded that
The right-hand side is a convergent geometric series, so is a Cauchy sequence.
Next, we prove that is a Cauchy sequence. Using the update
, we have
Since is bounded and
is a Cauchy sequence, it follows that
Hence, is a Cauchy sequence.
Subsequently, we provide a formal proof that is a Cauchy sequence. Similar to the proof of Lemma 9, we derive the following inequality.
Here, M2 is its upper bound. From the update , it implies
Then,
Utilizing the linearity property of the t-product, we can rewrite the expression as =
. Within the framework of tensor t-product, the norm typically satisfies the following inequality
where denotes the spectral norm of tensor
. Because
is a fixed input tensor and its norm is bounded, then
Since it has been proven that is a Cauchy sequence, the right-hand side is bounded.
On the basis of (109) and is a bounded sequence, we have
Therefore, is a Cauchy sequence.
Next, we prove that is a Cauchy sequence. From the update
, this yields
where .
The subproblem for is
By Lemma 5, the optimality condition is
Note that if and only if
.
If , then
Since and
, we have
. Moreover, by Lemma 5, there exists a positive lower bound
such that
when
, so
Let M3 be the upper bound in (115). Then for ,
If , then
, and the inequality holds trivially.
Therefore,
Since is bounded by M4 and
grows geometrically, we have
Thus, is a Cauchy sequence.
Finally, the proof that is a Cauchy sequence follows similarly from the update
.
In conclusion, ,
,
,
, and
are all Cauchy sequences and hence converge to the critical points of the objective function
.
Additionally, by running Algorithm 1 on the four datasets above, we practically validated the convergence of the proposed 2D-NLRSC. As shown in Fig 7, the error curves quickly stabilize after a few iterations on each dataset. These two aspects collectively demonstrate that the 2D-NLRSC model exhibits stability and rapidity.
6.1 Convergence rate analysis
We complement the convergence proof by quantifying the rates for both feasibility residuals and objective/value descent under standard assumptions used by augmented Lagrangian and splitting schemes.
Notation 1. Let the penalty parameters satisfy and
with a fixed
. Define the primal feasibility residuals
and the data-consistency residual
Theorem 5 (Geometric decay of feasibility residuals). Suppose ,
,
and
are bounded (as established by Lemmas 6–9). Then there exist finite constants
such that, for all
,
Consequently, since and
with
, all residuals decay R-linearly
Proof: From the optimality of the -subproblem and the update
, we have
The same argument using the and
updates yields the bounds for
and
. For
, the optimality of the
-subproblem together with
gives
Geometric decay follows from .
Remark 1. Theorem 5 implies that the KKT termination test in (92) is met after iterations, dominated by the largest of the four residuals.
7 Conclusion
This study proposes 2D-NLRSC, a submodule clustering framework using nonconvex low-rank tensor approximation. Unlike traditional vectorization approaches, it rotates 2D images into tensors and employs the t-product to derive self-representative tensors. The key innovations involve an -induced tensor nuclear norm for optimal low-rank representation, along with a combined
-norm and
-norm regularization to enhance structure capture while effectively handling noise and redundancy. Theoretical convergence is demonstrated through the KKT conditions. Experiments on ORL, JAFFE, CMU-PIE, Yale, and MNIST show superior clustering accuracy and computational efficiency. Despite these advantages, this study still has certain limitations. The performance of 2D-NLRSC relies on the selection of several hyperparameters, which may require adjustment when applied to new datasets. Moreover, although relatively efficient, the computational complexity of its iterative optimization process remains relatively high for very large-scale image data. Future work will focus on automating hyperparameter selection and extending the framework to handle higher-order imaging data.
References
- 1.
Wright J, Ma Y. High-dimensional data analysis with low-dimensional models: principles, computation, and applications. Cambridge: Cambridge University Press; 2022.
- 2. Elhamifar E, Vidal R. Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans Pattern Anal Mach Intell. 2013;35(11):2765–81. pmid:24051734
- 3. Liu G, Lin Z, Yan S, Sun J, Yu Y, Ma Y. Robust recovery of subspace structures by low-rank representation. IEEE Trans Pattern Anal Mach Intell. 2013;35(1):171–84. pmid:22487984
- 4. Zhou Q, Che H, Guo W, He X, Leung MF, Wen S. Robust low-rank tensor constrained orthogonal symmetric non-negative matrix factorization for multi-layer networks community detection. IEEE Transactions on Emerging Topics in Computational Intelligence.
- 5. Pu X, Che H, Pan B, Leung M-F, Wen S. Robust weighted low-rank tensor approximation for multiview clustering with mixed noise. IEEE Trans Comput Soc Syst. 2024;11(3):3268–85.
- 6. Xie F, Yuan J, Nie F, Li X. Dual-bounded nonlinear optimal transport for size constrained min cut clustering. arXiv preprint arXiv:250118143 2025.
- 7. Shi L, Shen Z, Yan J. Double-bounded optimal transport for advanced clustering and classification. AAAI. 2024;38(13):14982–90.
- 8. Braman K. Third-order tensors as linear operators on a space of matrices. Linear Algebra and its Applications. 2010;433(7):1241–53.
- 9. Kilmer ME, Martin CD. Factorization strategies for third-order tensors. Linear Algebra and its Applications. 2011;435(3):641–58.
- 10. Kilmer ME, Braman K, Hao N, Hoover RC. Third-order tensors as operators on matrices: a theoretical and computational framework with applications in imaging. SIAM J Matrix Anal & Appl. 2013;34(1):148–72.
- 11. Li Q, Yang G. Multi-view clustering via global-view graph learning. PLoS One. 2025;20(6):e0321628. pmid:40455775
- 12. Zhuge W, Hou C, Jiao Y, Yue J, Tao H, Yi D. Robust auto-weighted multi-view subspace clustering with common subspace representation matrix. PLoS One. 2017;12(5):e0176769. pmid:28542234
- 13. Chen H, Liu X. Reweighted multi-view clustering with tissue-like P system. PLoS One. 2023;18(2):e0269878. pmid:36763648
- 14. Wu T, Bajwa WU. A low tensor-rank representation approach for clustering of imaging data. IEEE Signal Process Lett. 2018;25(8):1196–200.
- 15. Yin M, Gao J, Xie S, Guo Y. Multiview subspace clustering via tensorial t-product representation. IEEE Trans Neural Netw Learn Syst. 2019;30(3):851–64. pmid:30059323
- 16. Wu T. Graph regularized low-rank representation for submodule clustering. Pattern Recognition. 2020;100:107145.
- 17. Francis J, Madathil B, George SN, George S. A robust tensor-based submodule clustering for imaging data using ℓ12 regularization and simultaneous noise recovery via sparse and low rank decomposition approach. J Imaging. 2021;7(12):279. pmid:34940746
- 18. Madathil B, George SN. Noise robust image clustering based on reweighted low rank tensor approximation and ℓ12 regularization. SIViP. 2020;15(2):341–9.
- 19. Francis J, M B, George SN. A unified tensor framework for clustering and simultaneous reconstruction of incomplete imaging data. ACM Trans Multimedia Comput Commun Appl. 2020;16(3):1–24.
- 20. Candès EJ, Wakin MB, Boyd SP. Enhancing sparsity by reweighted ℓ1 minimization. J Fourier Anal Appl. 2008;14(5–6):877–905.
- 21.
Zuo W, Meng D, Zhang L, Feng X, Zhang D. A generalized iterated shrinkage algorithm for non-convex sparse coding. In: 2013 IEEE International Conference on Computer Vision. 2013. p. 217–24. https://doi.org/10.1109/iccv.2013.34
- 22. Zha Z, Yuan X, Wen B, Zhou J, Zhang J, Zhu C. A benchmark for sparse coding: when group sparsity meets rank minimization. IEEE Trans Image Process. 2020:10.1109/TIP.2020.2972109. pmid:32167891
- 23. Peng C, Zhang Y, Chen Y, Kang Z, Chen C, Cheng Q. Log-based sparse nonnegative matrix factorization for data representation. Knowl Based Syst. 2022;251:109127. pmid:40809933
- 24. Semerci O, Ning Hao, Kilmer ME, Miller EL. Tensor-based formulation and nuclear norm regularization for multienergy computed tomography. IEEE Trans Image Process. 2014;23(4):1678–93. pmid:24808339
- 25. Xie D, Yang M, Gao Q, Song W. Non-convex tensorial multi-view clustering by integrating l1-based sliced-Laplacian regularization and l2,p-sparsity. Pattern Recognition. 2024;154:110605.
- 26. Kernfeld E, Aeron S, Kilmer M. Clustering multi-way data: a novel algebraic approach. arXiv preprint 2014. https://arxiv.org/abs/1412.7056
- 27.
Lang S. Algebra. New York: Springer; 2005.
- 28. Wu T. Online tensor low-rank representation for streaming data clustering. IEEE Trans Circuits Syst Video Technol. 2023;33(2):602–17.
- 29.
Wu T. Robust data clustering with outliers via transformed tensor low- rank representation. In: International Conference on Artificial Intelligence and Statistics; 2024. p. 1756–64.
- 30. Francis J, Madathil B, George SN, George S. A nonconvex low rank and sparse constrained multiview subspace clustering via l1-induced tensor nuclear norm 2.IEEE Trans Signal Inf Process Netw. 2023.
- 31. Kang Z, Peng C, Cheng J, Cheng Q. LogDet rank minimization with application to subspace clustering. Comput Intell Neurosci. 2015;2015:824289. pmid:26229527
- 32.
Horn RA, Johnson CR. Matrix analysis. Cambridge: Cambridge University Press; 2012.
- 33.
Lu C, Min H, Zhao ZQ, Zhu L, Huang DS, Yan S. Robust and efficient subspace segmentation via least squares regression. In: Proceedings of the European Conference on Computer Vision. 2012. p. 347–60.
- 34. Tang K, Liu R, Su Z, Zhang J. Structure-constrained low-rank representation. IEEE Trans Neural Netw Learn Syst. 2014;25(12):2167–79. pmid:25420240
- 35. Heckel R, Bolcskei H. Robust subspace clustering via thresholding. IEEE Trans Inform Theory. 2015;61(11):6320–42.
- 36. Chun-Guang Li, Chong You, Vidal R. Structured sparse subspace clustering: a joint affinity learning and subspace clustering framework. IEEE Trans Image Process. 2017;26(6):2988–3001. pmid:28410106
- 37.
Patel VM, Vidal R. Kernel sparse subspace clustering. In: 2014 IEEE International Conference on Image Processing (ICIP), 2014. 2849–53. http://dx.doi.org/10.1109/icip.2014.7025576
- 38.
Lax PD. Linear algebra and its applications. New York: Wiley; 2007.
- 39. Lu Z. Iterative reweighted minimization methods for lp l p regularized unconstrained nonlinear programming. Math Program. 2013;147(1–2):277–307.
- 40. Ito K, Kunisch K. A variational approach to sparsity optimization based on Lagrange multiplier theory. Inverse Problems. 2013;30(1):015001.