Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Multi-view clustering via global-view graph learning

  • Qin Li,

    Roles Conceptualization, Funding acquisition, Methodology, Writing – original draft

    Affiliation School of Computer and Software Engineering, Shenzhen Institute of Information Technology, Shenzhen, China

  • Geng Yang

    Roles Formal analysis, Validation, Writing – review & editing

    yangg@sziit.edu.cn

    Affiliation School of Computer and Software Engineering, Shenzhen Institute of Information Technology, Shenzhen, China

Abstract

Multiview clustering aims to improve clustering performance by exploring multiple representations of data and has become an important research direction. Meanwhile, graph-based methods have been extensively studied and have shown promising performance in multiview clustering tasks. However, most existing graph-based multiview clustering methods rely on assigning appropriate weights to each view based on its importance, with the clustering results depending on these weight assignments. In this paper, we propose an a novel multiview spectral clustering framework with reduced computational complexity that captures complementary information across views by optimizing a global-view graph using adaptive weight learning. Additionally, in our method, once the Global-view Graph is obtained, cluster labels can be directly assigned to each data point without the need for any post-processing, such as the K-means required in standard spectral clustering. Our method not only improves clustering performance but also reduces computational resource consumption. Experimental results on real-world datasets demonstrate the effectiveness of our approach.

Introduction

In many practical applications such as video surveillance and image retrieval, heterogeneous features representing the same instance can be obtained. For example, images can be represented by various descriptors such as SIFT [1], HOG [2], GIST [3], and LBP [4]; webpages can be described by their content, the text of the pages that link to them, and the link structure of the linked pages. Since these heterogeneous features summarize the characteristics of objects from different perspectives, they are considered as multiple views of the data. Multiview learning, which aims to explore information from different views to improve learning performance, has become an important research direction [5,6].

Clustering is an unsupervised learning task that divides objects into meaningful groups. To appropriately integrate information from multiple views in clustering, many multiview clustering methods have been proposed [710]. These methods can be broadly categorized into three main approaches: spectral clustering-based methods, subspace clustering-based methods, and other advanced techniques such as tensor-based clustering.

Spectral clustering-based methods

Spectral clustering has become one of the most popular modern clustering algorithms. It is easy to implement, can be efficiently solved using standard linear algebra software, and generally outperforms traditional clustering algorithms such as k-means. Graph-based multi-view clustering algorithms, which extend spectral clustering to handle multi-view data, have shown good performance and have been widely studied. Kumar et al. [11] extended spectral clustering by co-regularizing the clustering assumptions of different views, ensuring that the graphs from different views are consistent with each other. Cai et al. [12] developed a multimodal spectral clustering algorithm to learn a common Laplacian matrix by imposing a non-negative constraint on the relaxed clustering assignment matrix. Nie et al. [13] proposed the Auto-weighted Multiple Graph Learning (AMGL) algorithm, which automatically learns a set of weights for all graphs without additional parameters. Xia et al. [14] proposed a Markov chain-based method for robust multi-view spectral clustering (RMSC). Based on bipartite graphs, Li et al. [15] used local manifold fusion to integrate heterogeneous features and proposed a new large-scale multi-view spectral clustering method (MVSC).

Subspace clustering-based methods

Subspace clustering-based approaches focus on finding a low-dimensional subspace that captures the shared structure across multiple views. Zhan et al. [16] introduced Multiview Consensus Graph Clustering (MCGC), which combines consensus matrix learning with subspace clustering to enhance clustering performance. Luo et al. [17] proposed Consistent and Specific Multi-view Subspace Clustering (CSMSC), which models both the consistency and specificity of multi-view data to learn a shared subspace representation. While these methods are effective in leveraging complementary information across views, they often face challenges with computational complexity and scalability.

Constrained Laplacian Rank (CLR) methods

The Constrained Laplacian Rank (CLR) model [18] is a significant advancement in graph-based clustering. CLR constructs a similarity matrix by imposing a rank constraint on the Laplacian matrix, ensuring that the resulting similarity matrix has exactly c connected components (where c is the number of clusters). This guarantees that the clusters are clearly separated in the graph structure.

Unlike traditional spectral clustering methods that require additional post-processing such as k-means, CLR directly assigns cluster labels to each data point. This avoids potential errors introduced by post-processing and improves efficiency. However, CLR was initially designed for single-view clustering and cannot simultaneously handle information from multiple views.

To address this, Nie et al. [19] extended CLR to multi-view clustering by proposing methods such as Parameter-Weighted Multi-view Clustering (PwMC) and Self-Weighted Multi-view Clustering (SwMC). While effective, these methods have certain limitations, such as sensitivity to weight parameters and the inability to fully integrate global and complementary information from multiple views.

Tensor analysis

Recent developments in tensor analysis and deep learning have inspired advanced methods for multi-view clustering. Tensor analysis-based methods [2022] exploit the higher-order structure of multi-view data to capture relationships between views, achieving promising results. However, these methods generally involve high computational complexity.

Although these approaches demonstrate effectiveness in specific scenarios, they still have certain limitations. Most methods fail to fully integrate the complementary information between views and the global structure of the data.

Our contributions

To address these limitations, we propose a new method based on CLR: Multi-view Clustering via Global-view Graph Learning (MCGGL). The proposed method constructs a Global Affinity Matrix that simultaneously captures the specificity of each view and the complementary information between different views. Unlike traditional methods, our approach directly optimizes the global similarity graph, enabling clustering label assignment without the need for post-processing. This results in improved clustering performance and computational efficiency. Experimental results on real-world datasets demonstrate the effectiveness of our proposed method.

Notations

In the entire text, all matrices are denoted by uppercase letters. For a matrix M, the ith row and the th element of M are denoted by mi and mij, respectively. The trace of M is denoted as . The th view of matrix M is denoted as . The transpose of matrix M is denoted as MT. The L2-norm of a vector is denoted as , and the Frobenius norm of a matrix M is denoted as . Specifically, we use 1n to represent an n-dimensional column vector where each element is 1.

Related work

CLR clustering

Given an initial affinity matrix , CLR aims to learn a new similarity matrix that has exactly c connected components, where n is the number of data points and c is the number of clusters. The Laplacian matrix associated with S is defined as , where DS is a diagonal matrix with its ith diagonal element being .

The Laplacian matrix has an important property as follows [23]:

Theorem 1: The multiplicity of the eigenvalue 0 in the Laplacian matrix LS is equal to the number of connected components in the graph associated with S.

Based on this observation, Nie et al. [18] constrained the rank of LS to be and proposed the following CLR model for graph clustering based on L1 norm and L2 norm distances:

(1)(2)

However, these problems seem difficult to solve because , and DS also depends on S, while the constraint is a complex nonlinear constraint. Nie et al. [18] proposed a novel and effective algorithm to address these issues.

Let denote the ith smallest eigenvalue of LS. Note that since LS is positive semidefinite. Problem (1) is equivalent to the following problem for sufficiently large values of :

(3)

When is sufficiently large, holds for each i, so the optimal solution S to problem (3) will make the second term equal to zero, thereby satisfying the constraint in problem (1).

According to Ky Fan’s theorem [24],

Theorem 2: For a Hermitian matrix, the sum of the smallest k eigenvalues is equal to the minimum trace of the matrix formed by projecting onto any k-dimensional subspace.

This give us:

(4)

Thus, problem (3) is further equivalent to the following problem:

(5)

Compared to the original problem (1), problem (5) is easier to solve. It has been demonstrated that CLR achieves superior performance in clustering [18]. Note that the CLR method is applicable only to single-view data; it cannot simultaneously handle graphs from multiple views.

Parameter-weighted multi-view clustering (PwMC)

CLR is a single-view graph-based clustering method. Nie et al. [19] extended this technique to the field of multi-view clustering. For multi-view data, let m be the number of views, and be the corresponding input sample matrices, where . Each view should be assigned a weight to measure its importance, and this idea can be naturally modeled by minimizing a linear combination of the reconstruction error for each view. Therefore, the constructed objective can be written as:

(6)

where and . The second term in problem (3) is used to smooth the weight distribution. The constraints and ensure that S is a non-negative matrix and each row represents a normalized probability distribution, maintaining the validity and stability of the similarity matrix.

PwMC involves an undesirable parameter . In an unsupervised learning setting without labeled instances, cannot be obtained through traditional supervised hyperparameter tuning techniques such as cross-validation. Additionally, it was observed that the final experimental performance is very sensitive to , and the optimal value of varies across different datasets, making PwMC (and related multi-view clustering methods that use similar weight learning strategies) impractical. Therefore, to remove while retaining much of the accuracy, Nie et al. [19] further proposed a new self-weighted multi-view clustering (SwMC) method.

Self-weighted multi-view clustering (SwMC)

For Self-Weighted Multi-View Clustering (SwMC), the proposed objective function is:

(7)

After a series of transformations, Nie et al. [19] rewrote equation (7) into the following form, allowing the weight parameters to be learned adaptively:

(8)

Proposed method

Problem formulation and objective function

To effectively capture the complementary information between different views while considering the specific information of each view to obtain optimal global information, we designed the following multi-view clustering objective function.

(9)

In our model, i represents the ith row, j represents the jth column, m represents the number of views. Specifically, based on the initial affinity matrix of each given view, we constructed a Global Affinity Matrix A and a Global Similarity Matrix S. We then capture the complementary information between different views by minimizing the distance between S and A, and capture the specific information of each view by minimizing the distance between S and . By analyzing the Global-view Graph corresponding to S, clustering labels can be directly assigned to each data point without any post-processing, such as the K-means step required in standard spectral clustering.

represents the initial affinity matrix learned from the sample matrix of the th view. represents the initial affinity matrix learned from the global sample matrix X. The method for calculating the initial affinity matrix follows the standard CLR model [18].

The global sample matrix is obtained by stacking the sample matrices of each view. For example, if there are three sample matrices X(1), X(2), and X(3) from three views, we convert each sample matrix into a column vector, forming a global sample matrix .

S is the global similarity matrix to be solved. The Laplacian matrix associated with S is defined as , where DS is a diagonal matrix with diagonal elements . H is the clustering label matrix to be solved.

Specifically, the complementary information is captured by minimizing the discrepancy between the Global Similarity Matrix S and both the Global Affinity Matrix A and view-specific affinity matrices . Mathematically, the objective function can be expressed as:

(10)

Here, and W are adaptive weights that dynamically adjust the contribution of each view and the global information, respectively. By jointly optimizing these components, the proposed method integrates the complementary strengths of individual views and global data patterns, ensuring robust clustering results.

The complementary information between views is explicitly captured by minimizing the term . This enforces alignment between the Global Similarity Matrix S and individual view-specific matrices , ensuring that S incorporates the unique contributions of each view. At the same time, the term promotes consistency with the Global Affinity Matrix A, which represents an aggregated perspective of all views. By balancing these terms, the method ensures that S reflects both shared and unique characteristics of the data.

Optimization

We developed the following optimization scheme to solve our objective function (9). For convenience, we present the objective function (9) in the following form.

(11)

Meanwhile, the objective function (9) with the rank constraint is difficult to solve. Based on Theorems 1 and 2, we relaxed it by replacing the rank constraint with the minimization of for easier optimization.

Inspired by SwMC, we learn the weight parameters for each view and the global view adaptively.

(12)

Then problem (11) becomes:

(13)

For problem (13), we can solve S, W, and iteratively.

(1) Fix W and , update S:

i. Solving H when S is fixed:

When S is fixed, problem (12) becomes:

(14)

It is known that the optimal solution for H consists of the c eigenvectors of corresponding to the smallest c eigenvalues. Since , problem (13) can be efficiently solved using the Arnoldi iteration method.

ii. Solving S when H is fixed:

When H is fixed, problem (12) becomes:

(15)

Since problem (12) is independent for different i, we can solve the following problem for each i separately:

(16)

For simplicity, let , and let be a vector whose jth element is equal to . Since the jth element of ai equals aij, and the jth element of equals , problem (15) can be written in the following vector form:

(17)

This problem is a constrained convex optimization problem. The objective function is the squared L2-norm of si, which is a convex function. The constraints include , which is a linear constraint requiring the sum of all elements to equal 1, and , which is an element-wise non-negativity constraint.

Linear constraints and non-negativity constraints are both convex sets, so this is a typical convex optimization problem with a convex set as the constraint space. It can be solved using classical methods such as the projected gradient method, where the vector is projected back onto the constraint space during gradient descent. For this type of problem, when the matrix to be updated in each iteration of the projected gradient method is sparse, meaning that only k vectors in the entire linear space need to be updated at each iteration, the "sparse projected gradient method" proposed by Duchi et al. [25]. Duchi et al. provides an efficient solution, reducing the complexity of each iteration to . Referring to the case of a positive simplex as the constraint space in [25], to accelerate the solution of S, in each iteration, we choose to update only the k neighbors of data i. We set k to a small constant to ensure that S is completely sparse, then use the sparse projected gradient method to quickly complete an iteration.

(2) Fix S, update W and :

Based on the current S, use equation (11) to calculate the current W and .

The above optimization scheme can be expressed as the following pseudocode (Algorithm 1):

Algorithm 1. Solve the objective function.

Convergence analysis

Lemma 1. Nie et al. [26] proved that for any positive numbers u and , the following inequality holds:

(18)

Based on Lemma 1, we can prove that the Algorithm 1 will monotonically decrease the objective of Eq.(9) in each iteration. The proof is as follows.

Proof: According to Algorithm 1, we have

(19)

where t and t + 1 denote the t-th and -th iterations, respectively.

According to Lemma 1, we have

(20)(21)

Thus, for all views, we have

(22)

By simple algebra, we obtain

(23)

By combining Eq. (19) and Eq. (23)

(24)

Thus, the Algorithm 1 will monotonically decrease the objective of Eq.(9) in each iteration.

Complexity analysis

The computational complexity of the proposed algorithm is primarily determined by the solutions for four variables: S, H, W, and . Given that , m is neglected in the analysis.. The solution for S uses the sparse projected gradient method based on [25]. Under the condition that only the k neighbors of data i are updated in each iteration, the complexity is . The solution for H uses the Arnoldi iteration method, with a complexity of approximately . The computational complexity for W and is O(n2), but since only addition and subtraction are involved in each calculation, the actual computation is very fast. In summary, given that , the computational complexity of our algorithm in one iteration is approximately .

Experimental results and analysis

Hardware setup and execution time

The experiments were conducted on a machine equipped with an Intel Xeon Platinum 8352V CPU (2 processors, 2.10GHz base frequency, 3.50GHz turbo frequency) and 128GB RAM. No GPU was used in the experiments, demonstrating the efficiency of our method even in a CPU-only environment. Table 1 summarizes the execution times for each dataset. The results demonstrate that our method is efficient and suitable for deployment on resource-limited devices.

Convergence behavior

To visually demonstrate the convergence behavior of the proposed algorithm, we plotted the convergence curves for the optimization process on the ORL and Yale datasets. Fig 1 illustrates how the objective function value changes over the iterations.

thumbnail
Fig 1. Convergence curves of the proposed algorithm on the ORL and Yale datasets.

The objective function value decreases rapidly in the first few iterations and stabilizes as the algorithm converges.

https://doi.org/10.1371/journal.pone.0321628.g001

The curves show that the algorithm converges within a small number of iterations, indicating its efficiency in solving the optimization problem. The convergence curves are consistent across datasets, demonstrating the robustness of our method.

Visualization of clustering results

To provide an intuitive understanding of the clustering performance, we visualized the extracted features for the Yale and ORL datasets using t-SNE.

Fig 2 demonstrates the clustering results for the Yale (a) and ORL (b) datasets. Each point represents an instance, and the colors indicate the ground-truth cluster assignments. It can be observed that the extracted features group naturally into distinct clusters, demonstrating the effectiveness of our clustering algorithm. For both datasets, the majority of clusters are well-separated, indicating good clustering performance.

thumbnail
Fig 2. Visualization of clustering results on Yale and ORL datasets.

(a) Clustering visualization for the Yale dataset using t-SNE, showing distinct clusters corresponding to extracted features. (b) Clustering visualization for the ORL dataset using t-SNE, illustrating clear separation between clusters.

https://doi.org/10.1371/journal.pone.0321628.g002

Performance comparison

In this section, we evaluate and compare the performance of the proposed method on four widely used multi-view datasets:

The MSRC-v1 dataset [27] consists of 240 images in eight categories. Following [28], we selected 7 categories with 30 images each and extracted five visual features: 24-D color moments, 512-D GIST, 576-D HOG, 254-D CENTRIST, and 256-D LBP.

The Caltech101 dataset [29] includes 101 categories. We used 441 samples from 7 categories, constructing three views: 2560-D SIFT, 1160-D LBP, and 620-D HOG.

The ORL dataset contains 400 facial images of 40 individuals under varying conditions. We constructed three views: 6750-D Gabor, 4096-D intensity, and 3340-D LBP features.

The Yale dataset has 165 images of 15 individuals under different conditions. We constructed three views: 3304-D LBP, 6750-D Gabor, and 4096-D intensity features.

We compare our method with the following five multi-view clustering algorithms:

  1. (CSMSC): A subspace learning method that captures the consistency information of multiple views;
  2. Auto-weighted Multi-Graph Learning (AMGL) [18]: Constructs graphs for each single view and automatically learns the optimal weight for each graph;
  3. Robust Multi-view Spectral Clustering (RMSC) [14]: Recovers a latent transition probability matrix from the matrices computed from each single view for multi-view clustering;
  4. Multi-view Clustering based on Consensus Matrix Learning (MCGC) [16];
  5. Multi-view Clustering based on Adaptive Graph Learning (MVGL) [17].

To evaluate performance, we use Accuracy (ACC), Normalized Mutual Information (NMI), and Purity as clustering performance measures. Higher values of these three metrics indicate better performance.

For each comparison method, we adjusted the parameters to achieve optimal results. Since the performance of K-means clustering is highly sensitive to the choice of initial centroids, all methods involving K-means are repeated 10 times, and the average results are reported. As for our MCGGL, we run it only once.

Table 2 lists the ACC, NMI, and Purity values of the seven methods on the four datasets mentioned above, where SC-best refers to performing spectral clustering on each individual view and selecting the best result.

thumbnail
Table 2. The clustering performances on Caltech101, MSRC-V1, Yale, and ORL datasets.

https://doi.org/10.1371/journal.pone.0321628.t002

From observing Table 2, we can draw the following conclusions:

  • Multi-view clustering methods generally outperform SC-best and achieve the best performance. The reason may be that multi-view representations provide more information compared to single-view representations. By utilizing more information, multi-view methods achieve better clustering results.
  • CLR-based multi-view clustering methods, such as MCGC and our proposed method, outperform most other methods on most datasets. This may be because CLR-based clustering methods directly obtain the clustering labels of the data, whereas other methods rely on the quality of the input graphs and require additional post-processing steps, such as K-means, which can lead to suboptimal solutions.
  • Our proposed method outperforms other clustering methods. This may be because our method, during the Global-view Graph learning process, effectively captures the specificity of each view while also considering the complementary information between different views, thus fully integrating the complementary information and global information of the entire dataset.
  • Our method achieves relatively good clustering accuracy across multiple multi-view datasets, including Caltech101, MSRC-v1, Yale, and ORL. This consistent high performance demonstrates the strong generalization ability of our model, making it adaptable to the characteristics of different datasets and capable of producing robust clustering results.

Parameter analysis

In this section, we analyze the effect of the parameter on our algorithm. We adjusted the parameter to values , conducted ten experiments, and compared the results. Similarly, we chose ACC, NMI, and Purity as evaluation metrics.

Fig 3 shows the clustering performance as varies across four datasets. Without parameter tuning, the method cannot fully capture complementary and global information from multiple views. From Fig 3, we can see that as changes, our method shows significant fluctuations, and when , our method performs worse across all four datasets compared to the best performance when . This also indicates that Global-view Graph Learning helps improve clustering performance.

thumbnail
Fig 3. The effect of different values of on clustering metrics with different datasets.

(a) MSRC-V1, (b) Caltech101, (c) ORL, and (d) Yale. The figure illustrates how the clustering metrics change as the value of varies.

https://doi.org/10.1371/journal.pone.0321628.g003

Discussion

Our method effectively integrates complementary information between different views and the global information of the entire dataset. This integration ensures superior performance compared to various spectral clustering methods across multiple datasets, as demonstrated by consistent improvements in clustering metrics such as ACC, NMI, and Purity.By constructing a Global-view Graph, our approach captures both the specificity of each view and the complementary information between views, which leads to robust clustering results.

In terms of computational efficiency, our method leverages the CLR-based framework to achieve low computational complexity. Solving the proposed optimization objective results in a computational complexity of approximately per iteration, making it more suitable for large-scale clustering tasks than many existing approaches.

Spectral clustering methods based on tensor analysis, in contrast, typically have a higher computational complexity of O(n3). For instance, the computational complexity of a single iteration in Xia et al. [21] is approximately . While tensor analysis-based methods perform well in terms of accuracy, their high computational cost limits their practical application in resource-constrained environments.

Clustering networks based on deep learning also exhibit high computational complexity. For example, consider a deep autoencoder network with depth L. Assuming n samples aggregated into c clusters, and with both the sample dimension and hidden layer feature dimension as d, a single iteration involves the following operations:

  1. (1) Forward propagation through each layer of the encoder network with a complexity of .
  2. (2) Backpropagation with a similar complexity of .
  3. (3) Updating the clustering loss, assuming k-means is used, with a complexity of .

The total complexity is . Given that d2>n, this complexity is often higher than the square of the number of samples and grows rapidly with the depth of the network.

If large models like BERT [30] are used for clustering tasks, the computational complexity increases further due to the Transformer architecture [31]. Assuming the self-attention layer has h heads, the main operations involved in a single iteration include:

  1. (1) Dense matrix multiplication in the self-attention layer during forward propagation, with a complexity of .
  2. (2) Dense matrix multiplication in the feedforward network (FFN) during forward propagation, with a complexity of .
  3. (3) Backpropagation, which has a complexity similar to forward propagation, .
  4. (4) Updating the clustering loss, with a complexity of .

The total complexity of a single iteration of BERT is approximately . This highlights that deep learning-based approaches, especially those using large models, impose significant computational demands.

In summary, our method achieves a favorable balance between performance and computational efficiency. By reducing algorithmic complexity while maintaining high clustering accuracy, our approach is well-suited for deployment in resource-constrained environments. Potential applications include devices such as mobile phones, tablets, drones, and edge computing gateways. Furthermore, our method’s efficiency and performance make it a strong candidate for large-scale real-world clustering tasks, where computational resources may be limited.

While the proposed algorithm demonstrates superior performance and efficiency, we acknowledge its sensitivity to the parameter , which balances the contribution of global and view-specific information during clustering. As shown in our parameter analysis section, the choice of significantly impacts the clustering results. Improper tuning of can lead to suboptimal performance, particularly for datasets with high variability in view-specific or global information. To mitigate this issue, future work could explore automatic parameter selection methods or parameter-free approaches to enhance the robustness of the algorithm.

Conclusion and future work

In this paper, we propose a CLR-based multi-view clustering method to learn a Global-view Graph with exactly c connected components, which is an ideal structure for clustering. The proposed Multi-view Clustering via Global-view Graph Learning (MCGGL) aims to extract the specific information of each view while considering the complementary information between different views, thereby fully integrating the complementary information between different views and the global information of the overall data. To achieve this goal, we construct an objective function and derive a simple and effective optimization algorithm to solve this objective function, resulting in low computational complexity for our algorithm. Our method demonstrates significant improvements in clustering performance across diverse datasets by fully integrating complementary and global information from multiple views. Experimental results show that the clustering performance of our proposed method is superior to other clustering methods.

In the future, we will attempt to integrate CLR-based, tensor analysis-based, and deep learning-based multi-view clustering methods. For example, we will introduce the idea of directly assigning clustering labels to data from CLR into tensor analysis-based multi-view clustering methods to ensure that the model achieves an optimal solution. Furthermore, we can attempt to construct a tensor analysis neural network connected to a deep auto-encoder network, creating an end-to-end tensor analysis-based multi-view clustering network. This approach would fully leverage the advantages of tensor analysis in multi-view data fusion, increase the interpret-ability of deep learning networks, and effectively control the network’s width and depth, thereby improving clustering performance while reducing computational resource consumption.

References

  1. 1. David G Lowe. Distinctive image features from scale-invariant keypoints. Int J Comput Vis. 2004;60(2):91–110.
  2. 2. Dalal N, Triggs B. Histograms of oriented gradients for human detection. CVPR, vol. 1; 2005. p. 886–93.
  3. 3. Oliva A, Torralba A. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int J Comput Vis. 2001;42(3):145–75.
  4. 4. Timo Ojala, Matti Pietikäinen, Topi Mäenpää. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell. 2002;24(7):971–87.
  5. 5. Chang Xu, Dacheng Tao, and Chao Xu. A survey on multi-view learning. Comput Sci. 2013.
  6. 6. Hou C, Nie F, Tao H, Yi D. Multi-view unsupervised feature selection with adaptive similarity and view weight. IEEE Trans Knowl Data Eng. 2017; PP(99):1–1.
  7. 7. Chaudhuri Kamalika, Kakade Sham M., Livescu Karen, Sridharan Karthik. Multi-view clustering via canonical correlation analysis. ICML; 2009. p. 129–36.
  8. 8. Kumar Abhishek, Daumé Hal. A co-training approach for multi-view spectral clustering. ICML; 2011. p. 393–400.
  9. 9. Liu Jialu, Wang Chi, Gao Jing, Han Jiawei. Multi-view clustering via joint nonnegative matrix factorization. ICDM; 2013. p. 252–60.
  10. 10. Cai Xiao, Nie Feiping, Huang Heng, Kamangar Farhad. Heterogeneous image feature integration via multi-modal spectral clustering. IEEE; 2011. p. 1977–84. 2011.
  11. 11. Kumar Abhishek, Rai Piyush, Daume Hal. Co-regularized multi-view spectral clustering. NIPS; 2011. 1413–21.
  12. 12. Cai X, Nie F, Huang H. Multi-view k-means clustering on big data. IJCAI; 2013. 2598–604.
  13. 13. Nie Feiping, Li Jing, Li Xuelong. Parameter-free auto-weighted multiple graph learning: A framework for multiview clustering and semi-supervised classification. Proceedings of the twenty-fifth international joint conference on artificial intelligence; 2016. p. 1881–87.
  14. 14. Xia R, Pan Y, Du L, Yin J. Robust multi-view spectral clustering via low-rank and sparse decomposition. AAAI; 2014. p. 2149–55.
  15. 15. Yeqing Li, Feiping Nie, Heng Huang, and Junzhou Huang. Large-scale multi-view spectral clustering via bipartite graph. AAAI; 2015. p. 2750–6.
  16. 16. Zhan K, Nie F, Wang J, Yang Y. Multiview consensus graph clustering. IEEE Trans Image Process. 2019;28:1261–70. pmid:30346283
  17. 17. Luo S, Zhang C, Zhang W, Cao X. Consistent and specific multi-view subspace clustering. Proceedings of the Proc. AAAI; 2018. p. 3730–7.
  18. 18. Feiping Nie, Xiaoqian Wang, Michael I. Jordan, and Heng Huang. The constrained laplacian rank algorithm for graph-based clustering. AAAI; 2016.
  19. 19. Feiping Nie, Jing Li, and Xuelong Li. Self-weighted multiview clustering with multiple graphs. Proceedings of the twenty-fifth international joint conference on artificial intelligence; 2017. p. 2564–70.
  20. 20. Xu H, Zhang X, Xia W, Gao Q, Gao X. Low-rank tensor constrained co-regularized multi-view spectral clustering. Neural Netw. 2020;132:245–52. pmid:32927427
  21. 21. Xia W, Zhang X, Gao Q, Shu X, Han J, Gao X. Multi-view subspace clustering by an enhanced tensor nuclear norm. IEEE Trans Cybern. 2021.
  22. 22. Qin Li, Geng Yang, Yu Yun, Yu Lei, and Jane You. Tensorized discrete multi-view spectral clustering. Electronics. 2024.
  23. 23. Chung RK. Spectral graph theory, vol. 92. American Mathematical Society; 1997.
  24. 24. Fan K. On a theorem of Weyl concerning eigenvalues of linear transformations. Proc Natl Acad Sci. 1949;35(11):652–5.
  25. 25. Duchi J, Shalev-Shwartz S, Singer Y, Chandra T. Efficient projections onto the l1-ball for learning in high dimensions. Proceedings of the 25th international conference on machine learning. ACM; 2008. p. 272–9.
  26. 26. Feiping Nie, Heng Huang, Xiao Cai, Chris H. Ding. Efficient and robust feature selection via joint L2,1-norms minimization. Advances in neural information processing systems; 2010. p. 1813–21.
  27. 27. Winn John M, Jojic Nebojsa. LOCUS: Learning object classes with unsupervised segmentation. 10th IEEE international conference on computer vision (ICCV 2005), 17–20 October 2005. Beijing, China; 2005. p. 756–63.
  28. 28. Lee Yong Jae, Grauman Kristen. Foreground focus: Unsupervised learning from partially matching images. Int J Comput Vis. 2009;85(2):143–66.
  29. 29. Fei-Fei L, Fergus R, Perona P. Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. Comput Vis Image Understand. 2007;106:59–70.
  30. 30. Devlin J, Chang M-W, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. 2019.
  31. 31. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. arXiv:1706.03762. 2017.