Soft label collaborative view consistency enhancement with application to incomplete multi-view clustering

Jie Zhang; Jiali Tang

doi:10.1371/journal.pone.0326852

Abstract

Incomplete multi-view clustering (IMVC) is an unsupervised technique for clustering multi-view data when some view information is absent. However, most existing IMVC methods usually suffer from several significant challenges: (1) Inaccurate imputation or padding of missing data degrades clustering performance; (2) The ability to extract view features may decrease due to low-quality views, especially those that are inaccurately imputed. To overcome these challenges, in this paper, we introduce a novel IMVC framework, called soft label collaborative view consistency enhancement (SLC_CE). Firstly, we leverage the encoders of Transformers to construct a soft-label view information interaction module, which fully utilizes soft-labels to enhance view feature embeddings. Secondly, we employ soft labels to collaboratively impute missing features, addressing the incomplete multi-view data problem. Finally, we implement a consistency enhancement strategy across multi-level view features and soft labels to ensure high-quality feature extraction and imputation. Extensive experiments on several benchmark datasets demonstrate that the proposed SLC_CE method outperforms other state-of-the-art methods in real IMVC tasks.

Citation: Zhang J, Tang J (2025) Soft label collaborative view consistency enhancement with application to incomplete multi-view clustering. PLoS One 20(7): e0326852. https://doi.org/10.1371/journal.pone.0326852

Editor: Zhe Liu, Xinyu University, CHINA

Received: November 5, 2024; Accepted: June 5, 2025; Published: July 1, 2025

Copyright: © 2025 Zhang, Tang. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: (1) The Aloi-100 dataset underlying the results presented in the study is available from Geusebroek at https://github.com/youweiliang/Multi-view_Graph_Learning/blob/master/data/ALOI_100.mat. (2) The Scene15 dataset underlying the results presented in the study is available from Oliva, Torralba, Fei-Fei Li, Perona, and Lazebnik at https://github.com/QinghaiZheng1992/Code-for-UGLTL/blob/master/dataset/scene15.mat. (3) The MNISTUSPS dataset underlying the results presented in the study is available from U.S. Postal Service at https://github.com/YangSkywalker/L1-MvDA-VC/blob/main/Data/MNIST-USPS.mat. (4) The NoisyMNIST dataset underlying the results presented in the study is available from Louisiana State University at https://github.com/fariba87/noisyMNIST/tree/main/noisyMNIST/noisyMNIST.

Funding: This work was supported by the Changzhou Science and Technology Program (CE20215029). The funders provide support for the decision to publish, and preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

1 Introduction

Multi-view clustering (MVC) is a well-known unsupervised learning technology that divides instances into clusters by utilizing their feature representations. These views can be derived from different sensors, domains, or feature extractors, providing a more comprehensive perspective of each instance [1–4]. The MVC technology [5–9] is fundamentally based on the assumption that all view data are fully available. However, in many real-world situations, multi-view data is frequently incomplete due to sensor malfunctions or missing information during collection. This poses significant challenges for directly applying MVC techniques to incomplete multi-view data.

To address this challenge, many incomplete multi-view clustering (IMVC) methods have been developed in recent years. Existing IMVC techniques [10–13] can be grouped into three main categories: matrix factorization-based IMVC, kernel learning-based IMVC, and graph learning-based IMVC. IMVC approaches based on matrix factorization [10,13–15] focus on decomposing multi-view data matrix to recover missing views and uncover shared representations. Wang et al. [16] fully explored spectral perturbation theory and then applies a tailored matrix completion approach to handle the similarity matrices of incomplete multi-view data. Rai et al. [15] adopted the non-negative matrix factorization (NMF) method to exploit the intrinsic geometric structure of the data distribution in each view. Kernel learning-based IMVC methods [11,17] cope with missing data by constructing a kernel matrix and then applying imputation techniques to estimate the missing values. For example, Liu et al. [17] integrated the imputation of incomplete kernel matrices with multiple kernel alignments to cluster in a unified framework. Graph-based methods [11,12,18] construct similarity graphs to represent relationships between data instances. This technique leverages the geometric structure of the graph to propagate information and handle missing data. Zhao et al. [12] employed unrestricted anchors to reconstruct relationships in high missing-rate data and integrated graph convolutional networks (GCNs) to obtain graph embeddings for clustering incomplete multi-view data. However, these aforementioned methods rely heavily on the quality of initial multi-view data and thus cannot fully capture the complex relationships between views.

Benefiting from the powerful feature representation capabilities of deep neural networks (DNNs), several deep IMVC methods [19,20,20–23] have been developed to deal with incomplete multi-view data. Autoencoder-based methods [24,25] use DNNs to learn feature representations and reconstruct missing views. Choudhury et al. [24] first imputed missing inputs using the k-nearest neighbor rule, and then preserved the structure of the input data in the latent space by incorporating Sammon’s stress as a regularizer in the objective function of the autoencoder. GAN-based deep IMVC methods [26–28] generate missing data through adversarial learning technology. Zhou et al. [26] employed adversarial learning and attention mechanisms to align latent feature distributions and quantify the importance of the modalities, respectively. With the development of contrastive learning methods, they have been integrated into the deep IMVC framework to learn consistent representations across views through contrastive learning strategies [29,30]. In [29], consistency learning is performed by maximizing mutual information between different views through contrastive learning, while missing views are recovered by minimizing conditional entropy through dual prediction. Despite the impressive progress of these methods, they still face issues with inaccurate imputation and low-quality feature extraction.

To mitigate these limitations, we introduce a novel IMVC framework, called soft label collaborative view consistency enhancement (SLC_CE). As illustrated in Fig 1, the proposed SLC_CE method is designed to leverage the synergy between multiple views and soft labels, enabling accurate recovery of missing views. The proposed method designs an information interaction module by using soft-label information to enhance view feature embedding. In addition, to address incomplete multi-view data, we employ generated soft labels to recover missing view features using the k-nearest neighbor approach. Finally, to ensure the quality of view feature extraction and missing data recovery, we adopt a consistency enhancement strategy to constrain soft labels and multi-level view features. Extensive experimental results show the effectiveness of the proposed method in IMVC tasks.

Download:

Fig 1. The overall framework of the proposed SLC_CE method.

https://doi.org/10.1371/journal.pone.0326852.g001

The contributions of this work can be summarized as follows:

We propose an information interaction module, which enriches view feature embeddings by utilizing soft labels. This effectively promotes interaction between views, thereby learning more robust feature representations. Meanwhile, our method uses soft-label information to collaboratively impute missing features across views, ensuring that the imputation process is guided by learned feature complementarity and consistency.
We adopt a consistency enhancement strategy to constrain soft labels and multi-level view features. This helps maintain the quality of feature extraction and imputation and thus reduces the negative impact of low-confidence soft labels.
Extensive experimental results on four incomplete multi-view datasets demonstrate the effectiveness and robustness of our proposed SLC_CE method compared to other state-of-the-art methods in complex IMVC tasks.

2 Related work

In this section, we briefly review related work on contrastive learning-based MVC, Transformer-based MVC, and IMVC methods.

2.1 Contrastive learning-based MVC

Contrastive learning is a well-established and effective unsupervised representation learning method, known for its ability to effectively generalize across different types of data representations [31–33]. Inspired by constrastive learning, contrastive multi-view learning has been proposed in the past few years [23,29,34]. For example, Tian et al. [35] applied contrastive learning to maximize mutual information between representations of different views, facilitating the learning of shared information across these views. Contrastive learning aims to increase the similarity between positive pairs of representations while minimizing the similarity between negative pairs, which closely aligns with clustering objectives. [36] used contrastive learning to align multi-view representations obtained from view-specific encoders, and then fused these aligned representations for single-view clustering. Moreover, Xu et al. [37] introduced an approach where multi-view representations are initially aligned using a parameter-shared network, and then contrastive learning is applied to ensure consistency between multi-view features and semantic labels. These contrastive multi-view learning methods highlight the flexibility of contrastive learning techniques in multi-view clustering models, providing a promising approach to improving both representation learning and clustering outcomes in multi-view scenarios.

Although contrastive learning has achieved notable progress in IMVC tasks, it still encounters several challenges, particularly those arising from feature distribution discrepancies and view misalignment. Due to the differences in the distribution of multi-view data, existing contrastive learning methods cannot effectively capture and align the shared information between different views. Additionally, these methods often emphasize learning single-view features while neglecting global consistency and precise alignment between views. This oversight may result in suboptimal performance when handling complex multi-view data.

2.2 Transformer-based MVC

Attention was first introduced in sequence-to-sequence tasks to help models focus on the most informative parts of the input representations. The Transformer architecture [38] fully relies on attention mechanisms, capturing global dependencies between input and output sequences. The Vision Transformer [39] extends the Transformer architecture to image classification by treating non-overlapping image patches of moderate size as input sequences, similar to the use of labels in translation tasks. Then hierarchical Transformers [40,41] introduce a novel technique using shifted image patch windows and variational patch segmentation strategies. They shift windows over non-overlapping patches to capture information from each patch combination, while variational patch segmentation (also known as patch merging) ensures that the learning model incorporates local regions into the broader image context.

Recently, Transformer has been applied to real IMVC task [22,42,43]. Its attention mechanisms establish associations across positions to capture global contextual features. Transformer-based IMVC methods can learn relationships between different views through attention mechanisms, thereby enhancing clustering performance. The attention mechanisms dynamically learn key features and interactions within each view. Additionally, the multi-head attention mechanisms further strengthen the modeling of relationships between different views, leading to more accurate clustering results. Therefore, we introduce the Transformer to enhance feature representation capabilities in this work.

2.3 Incomplete multi-view clustering (IMVC)

Incomplete multi-view clustering (IMVC) focuses on improving clustering performance in scenarios where multi-view data are incomplete. One widely used approach is to extract a shared subspace from incomplete data using matrix factorization. A seminal method, called partial multi-view clustering (PVC) [44], directly computes a common latent representation for complete instances while deriving view-specific latent representations for incomplete samples through matrix decomposition. Therefore, several matrix decomposition-based IMVC methods have been developed in recent years. For example, Rai et al. [15] proposed a graph-regularized non-negative matrix factorization method based on PVC. Hu et al. [45] proposed a doubly aligned incomplete multi-view clustering (DAIMC) method, which employs weighted semi-non-negative matrix factorization with l_2,1 regularized regression to extract a shared representation. An alternative strategy in IMVC involves inferring missing samples. Wen et al. [46] developed a unified embedding alignment framework (UEAF) that addresses missing data by using an error matrix and reverse graph regularization to both complete the data and identify common structures. Then Wen et al. [47] explored high-order correlations across multiple views using tensor constraints, thereby learning similarity across multi-view graphs while recovering missing instances. A subspace clustering method has also been proposed to jointly perform data imputation and self-representation learning [48]. Inspired by generative adversarial networks (GANs) [49], Wang et al. [20] introduced a generative partial multi-view clustering approach that leverages GAN models to fill in missing data. More recently, [23] proposed an IMVC framework by combining consistency learning with data recovery. In addition, Lin et al. [29] presented a more generalized approach to learning representations from incomplete multi-view data.

Although these IMVC methods have demonstrated impressive performance, they often entail high computational costs and risk compromising data fidelity. The inherent complexity of feature extraction, alignment, and missing data inference across multiple views further hinders their scalability to large-scale datasets. Additionally, handling incomplete data can introduce noise or lead to the loss of important information, reducing data fidelity and impacting clustering performance. Therefore, preserving data integrity while improving computational efficiency poses a substantial challenge in IMVC applications.

3 Method

In this section, we introduce the proposed SLC_CE method for implementing IMVC tasks in detail.

3.1 Notations

Formally, let represent the multi-view data, where N is the number of samples and is the feature dimensionality. Here, denotes the v-th view, and ′NaN′ represents missing instances. The parameter K is the cluster number.

3.2 Overall framework

Fig 1 illustrates the overall framework of the proposed SLC_CE method. First, the proposed model employs an information interaction Transformer to enable interactive learning between soft labels and view information. Therefore, it aims to fully utilize soft-label information to extract the features of multi-view data. To cope with incomplete data, we adopt soft-label information to collaborate with multi-view data using the k-nearest neighbor algorithm to generate the missing view features. Finally, to ensure the quality of view feature extraction and missing data recovery, we employ a consistency enforcement strategy to ensure the accuracy of generated soft labels and multi-level view features.

3.3 Information interaction transformer

As shown in Fig 1, we first learn the embedding of the multi-view data . The features between different views are embedded into a common feature space. For a given sample from , the embedding vector can be expressed as , where d_e represents the dimension of the embedding features. We then stack the embedding vectors to obtain the original multi-view embedding sequence , which is further used as the input vector of the Transformer. Note that for incomplete multi-view data, we adopt the soft-label co-interpolation method (as detailed in Sect 3.4) to generate the embeddings of the missing views, ensuring that is complete in all views. At the same time, the extracted view feature embedding is fed into the Transformer to enhance the view feature embedding. Therefore, we have

(1)

(2)

where is a fully CNN and is the first layer of Transformer. is the incomplete multi-view data, and is the view feature embedding after Transformer_fv. Here, an adaptive fusion layer is introduced to fuse the information from multiple views into a shared view feature . The fusion process can be formulated as follows:

(3)

where represents the learnable weight and is the adjustment factor. By interacting with the shared view feature to explore the correlations between the soft labels and the view embeddings, Transformer_fl attempts to obtain complementary information from the soft label. This process results in the enhanced soft-label and feature embedding as follows:

(4)

where is the concatenation operation and denotes enhanced cluster soft labels. Subsequently, the output features of Transformer_fl are propagated into the second layer.

The second layer is designed to extract high-level shared features, which is achieved by promoting the interaction and fusion between soft-label information and view features extracted from the first layer. Therefore, it obtains a more discriminative representation of the multi-view data. This layer incorporates two Transformer blocks, denoted as and Transformer_sl. is used to enhance information across views and extract a high-level multi-view embedding . This is the enhanced representation of views by interacting with the shared soft label feature S_c and analyzing view correlations. Thus, we have

(5)

(6)

where is a linear layer as a projection function designed to map vectors from the soft label feature space to the view feature space. Correspondingly, Transformer_sl is employed to complement information across soft labels and extract high-level soft label vectors by leveraging the shared feature and discerning soft label correlations, as illustrated as follows:

(7)

(8)

where is a linear layer as a projection function to map vectors from the view feature space to the soft label feature space. Through the propagation of vectors , S_c, and among the transformer blocks, we facilitate the sharing of information between the view and soft label feature spaces, thereby extracting more refined and effective features of views and soft labels.

3.4 Soft-label collaborative imputation

It is well known that when a partial sample of multi-view data is missing, we cannot effectively learn the embedded features. Most existing methods try to use existing views to complete the missing views to improve the feature extraction performance in the scenario where samples are missing. However, most of these methods only use the k-nearest neighbor algorithm for completion. Therefore, in this work, we make full use of the soft-label information to cooperate with the k-nearest neighbor method for completion and use the clustered soft-label vector Q to help generate the missing views. Specifically, for a sample i, let represent the index of the existing view, and represent the index of the missing view. To use the original multi-view embedding to supplement the missing features of the sample i, we first find the k-nearest neighbors in the projected soft label feature space. The neighbor set D can be constructed as follows:

(9)

where is a function designed to identify the indices of the top K soft labels based on the smallest distance between embedding vectors and soft label vectors. Then, we employ a statistical method to describe the distribution of the missing views. We assume that the missing views satisfy the multivariate Gaussian distribution , whose mean vector and covariance matrix are denoted as follows:

(10)

(11)

For the missing views, we sample from this distribution times and substitute the missing views with the sampled results. Consequently, we can obtain the complete embeddings for the incomplete multi-view data. By reconstructing the missing multi-view data, our proposed method further enhances its performance in incomplete information clustering.

3.5 Soft-label and view consistency enhancement

Using the aforementioned soft-label view information interaction Transformer, we extract two multi-view embeddings and from different layers, respectively. To enable our encoder to effectively extract the features, it is crucial to enhance the discriminative ability of these embeddings. Specifically, according to the consistency between multiple views, the embedded features of samples from different views should be aligned. In addition, we can fully utilize the consistent features of multi-view data to improve the discriminative ability of and . Taking these factors into consideration, we introduce the embedding enhancement of multi-level view features. To learn more effective embeddings and , we use contrastive learning to align the embeddings of the same sample from different views. Therefore, we employ the loss function in the proposed model as follows:

(12)

(13)

(14)

where m and n refer to the indices of the m-th and n-th views, respectively. represents the cosine similarity and is the temperature parameter.

As previously mentioned, we utilize clustering soft labels to assist in completing missing data. This means that the quality of the recovery data depends largely on the accuracy of the soft labels. Here, we adopt contrastive learning to optimize the soft clustering process. For the m-th view, Q^m(:,j) have (Vk–1) pairs, where the (V–1) pairs are positive and the rest V(k–1) pairs are negative. Thereby, the contrastive loss can be defined as follows:

(15)

Similarly, our refined soft label feature consistency enhancement optimization is as follows:

(16)

where represents the cosine distance to measure the similarity between two labels, and is the temperature parameter. Moreover, we use the cross entropy as a regularization term to avoid the samples being assigned into a single cluster. Thus, the label consistency learning is formulated as follows:

(17)

where . After fine-tuning the labels through contrastive learning, the similarity between positive pairs is increased, resulting in latent features with a more distinct clustering structure.

Therefore, the full loss function of the proposed method is given as follows:

(18)

In this paper, the optimization of the objective function shown in Eq 18 is an end-to-end learning process. The total training process of the proposed model is summarized in Algorithm 1.

Algorithm 1. The proposed SLC_CE algorithm.

4 Experimental results and analysis

4.1 Datasets and metrics

We conducted experiments on four benchmark multi-view datasets: Aloi-100, Scene15, MNISTUSPS, and NoisyMNIST, as summarized in Table 1. To evaluate the robustness of our proposed method, we assessed the clustering performance of the proposed method under different missing rates, specifically [0.1, 0.3, 0.5, 0.7], across all datasets. The clustering performance was measured using three widely used clustering metrics: accuracy (ACC), normalized mutual information (NMI), and adjusted Rand index (ARI). Generally speaking, higher values for these indicators correspond to better clustering performance.

Download:

Table 1. The descriptions of four benchmark multi-view datasets.

https://doi.org/10.1371/journal.pone.0326852.t001

4.2 Comparison methods

In this experiment, we evaluated the proposed SLC_CE method against nine state-of-the-art IMVC techniques: COMPLETER [23] addresses missing views by minimizing the conditional entropy between different views through dual prediction. DCP [29] develops a unified framework to learn consistent representations across views and recover missing views in incomplete multi-view representation learning. CBG [50] proposes a flexible and efficient incomplete large-scale multi-view clustering method based on a bipartite graph framework to solve the problems of high complexity and expensive time consumption. CPSPAN [51] employs pair-observed data alignment to guide the construction of instance-to-instance correspondences across views. PIMVC [52] proposes a novel graph-regularized projective consensus representation learning model for IMVC. APADC [53] introduces an imputation-free deep IMVC method that incorporates distribution alignment in feature learning. DIVIDE [54] utilizes random walks to identify data pairs on a global scale, rather than locally, effectively reducing false negatives in contrastive learning. SCSL [55] proposes a sample-level cross-view similarity learning (SCSL) method for IMVC. DVIMC [56] introduces a variational autoencoder-based method to address the missing data problem in IMVC. VITAL [57] learns both common and specific information by modeling each sample as a Gaussian distribution. It uses variational inference for contrastive learning across views.

4.3 Implementation details

We employed a multi-layer perceptron (MLP) with a fully connected (Fc) network as the encoder to extract the features. For each view, the encoder structure was set as follows: Input–Fc500–Fc2000–Fc2000–Fc10. The temperature parameter was fixed at 1 for all experiments. We used the Adam optimizer with a learning rate of 1.0e-4. Due to differences in the distributions of the datasets, the hyperparameters were adjusted accordingly. For the Aloi-100 dataset, we used a batch size of 512, trained for 200 epochs, and set to 0.1 and to 1. For the Scene15 dataset, we used a batch size of 256, trained for 200 epochs, and set to 0.01 and to 1. For the MNIST-USPS dataset, we used a batch size of 512, trained for 200 epochs, and set α to 0.1 and to 1. For the NoisyMNIST dataset, we used a batch size of 1024, trained for 200 epochs, with set to 0.01 and to 1. All experiments were carried out on an Ubuntu system with an NVIDIA GeForce RTX 3090 GPU (24.0 GB memory).

4.4 Experimental results

To evaluate the performance of our proposed SLC_CE method in IMVC tasks, we compared it with several state-of-the-art methods. Table 2 presents the clustering results of our SLC_CE method and the baseline models on four incomplete datasets. The best results are highlighted in bold, and the second-best results are underlined. From the experimental results, we can get the following observations:

Download:

Table 2. The clustering performances of various IMVC methods with different missing rate settings. The best and second-best results are highlighted in bold and underlined, respectively. The symbol ‘N/A’ indicates a memory overflow error.

https://doi.org/10.1371/journal.pone.0326852.t002

1) It can be observed that our method outperforms other competitors, such as CBG, PIMVC, and SCSL. Traditional IMVC methods often rely on shallow learning models to process multi-view data, which limits their ability to capture nonlinear relationships and higher-order features. Most existing methods attempt to fill in missing views by leveraging available views, primarily using k-nearest neighbor (KNN) algorithms to complete the missing data and improve feature extraction. However, these methods struggle to fully capture the complex structural information inherent in multi-view data. In contrast, our method combines soft-label information with KNN for data completion and employs a clustered soft-label vector Q to recover the missing views. This allows our approach to more effectively handle complex real-world scenarios. The information interaction module leverages soft labels to enhance the feature embeddings across views, improving inter-view interactions and learning more robust feature representations. They ultimately lead to superior clustering performance, demonstrating the effectiveness of our soft-label imputation strategy.
2) Different other state-of-the-art deep IMVC approaches such as CPSPSN, DCP, and APADC, which predict missing views but do not fully leverage label information, our approach uses soft labels to fill in missing features across views more effectively. This is due to the guidance of learned feature relationships and consistency. This strategy significantly boosts the model’s performance and enhances its capability to handle missing data effectively.
3) We can observe from the results that our approach surpasses IMVC methods such as DIVIDE and COMPLETER, which also employ contrastive learning strategies to enhance view consistency. In contrast, our approach leverages a multi-level contrastive learning strategy to enforce consistency between soft labels and multi-level view features. This strategy not only preserves the quality of feature extraction and imputation, but also mitigates the negative effects of low-confidence soft-labels, resulting in more robust performance.

4.5 Ablation study

In this subsection, we evaluated the contribution of each component in our method with the same experimental setting. Specifically, we constructed three variants of the proposed method: (A) excluding the soft-label and view consistency enforcement part, called SV_CE (w/o SV_CE); (B) removing the soft-label view interaction Transformer in the graph and replacing it with Multi-Layer Perceptron (MLP), referred to as SV_IT (w/o SV_IT); (C) eliminating the soft-label collaborative part in the missing value recovery process, called SLC (w/o SLC). Table 3 shows the ablation results of our proposed method on four different datasets. It can be seen that removing any component from our method or replacing our proposed module with an alternative module significantly degrades the clustering performance. This shows that each component of our proposed method plays a vital role in IMVC tasks. Specifically, our SV_CE (w/o SV_CE) component performs consistency feature alignment operations from the view features and soft clustering levels through a contrastive learning strategy to learn feature consistency more effectively. This helps to reduce the negative impact of low-confidence soft labels and maintain the quality of feature extraction and filling. The SV_IT (w/o SV_IT) component plays a key role during the feature extraction. We flexibly employ the attention mechanism to interactively learn view features, and use soft clustering to maximize the utilization of soft labels, thereby enriching the view feature embedding. This effectively promotes the interaction between views and thus learns more powerful feature representations. The SLC (w/o SLC) component incorporates soft label information to guide the restoration of missing values, ensuring that the model accurately restores missing samples.

Download:

Table 3. Ablation experiments of our SLC_CE method with a missing rate of 0.7 on four datasets.

https://doi.org/10.1371/journal.pone.0326852.t003

4.6 Convergence analysis

In this subsection, we conducted a convergence analysis experiment on four benchmark datasets. Fig 2 illustrates the convergence of the proposed SLC_CE method on different multi-view datasets, each with a missing rate of 0.7. It can be seen that the loss decreases quickly in the first 50 epochs, then continues to decline gradually with minor fluctuations before eventually stabilizing. These convergence results demonstrate the reliability and effectiveness of the proposed method in tackling the incomplete multi-view clustering (IMVC) problem, demonstrating its consistent performance even under challenging conditions.

Download:

Fig 2. The loss values of the proposed SLC_CE method on the four datasets.

https://doi.org/10.1371/journal.pone.0326852.g002

4.7 Parameter analysis

In this subsection, we conducted experiments on four datasets to evaluate the parameter sensitivity of the proposed method. Here, we set the missing rate to 0.7 in this experiment. The proposed model includes two trade-off coefficients, and in Eq 18, with values ranging from 10⁻³ to 10. Fig 3 shows the experimental results of our proposed method on four incomplete multi-view datasets. The results indicate that our method maintains stable clustering performances across a wide range of parameters, demonstrating the insensitivity of our proposed method under different real applications.

Download:

Fig 3. The parameter analysis of the proposed method on four datasets.

https://doi.org/10.1371/journal.pone.0326852.g003

4.8 Visualization

To intuitively assess the effectiveness of the proposed SLC_CE model, we employed the t-SNE algorithm to visualize the distribution of latent features learned by the model with a missingness rate of 0.7. As illustrated in Fig 4, the generated clusters are distinctly separated with clear boundaries, demonstrating that our method effectively captures meaningful features from the multi-data. The clarity of these clustering results further confirms the robustness and effectiveness of the proposed method in handling complex clustering tasks.

Download:

Fig 4. The visualization results of the proposed SLC_CE method on the four datasets.

https://doi.org/10.1371/journal.pone.0326852.g004

4.9 Complexity analysis

In this subsection, we evaluate the computational efficiency of our method by measuring the number of parameters, running time, and floating point operations (FLOPs), and compare it with several state-of-the-art deep incomplete multi-view clustering approaches. The results in Table 4 show that our method outperforms other IMVC methods regarding the number of parameters, running time, and FLOPs. It highlights that the proposed model outperforms other state-of-the-art methods in clustering accuracy and maintains competitive computational efficiency, thus improving its overall effectiveness and scalability.

Download:

Table 4. Complexity analysis of competing deep IMVC methods on the NoisyMNIST dataset.

https://doi.org/10.1371/journal.pone.0326852.t004

5 Conclusion

In this paper, we introduce a soft label collaborative view consistency enhancement (SLC_CE) method for IMVC. Our approach leverages a soft-label view information interaction Transformer to fully exploit soft-label information for enhancing view feature embeddings. To handle the challenge of incomplete multi-view data, we employ the k-nearest neighbor method, guided by soft-label information, to recover missing view features across views. Additionally, we incorporate a consistency enhancement strategy to ensure accurate view feature extraction and missing data recovery by constraining soft labels and multi-level view features. Extensive experimental results have demonstrated that our SLC_CE method outperforms other state-of-the-art methods in clustering tasks involving incomplete multi-view data.

Although the proposed method achieves satisfactory clustering performance, it has several limitations. Specifically, it employs traditional autoencoders as the backbone network, which limits its feature extraction capability. Therefore, we will incorporate a more powerful feature extraction model, such as multimodal vision-language models, to enhance multi-view feature representations. In addition, the semi-paired problem in multi-view data is common in many applications, and adapting the proposed method to handle it remains a significant challenge.

References

1. Huang S, Tsang IW, Xu Z, Lv J. Measuring diversity in graph learning: a unified framework for structured multi-view clustering. IEEE Trans Knowl Data Eng. 2022;34(12):5869–83.
- View Article
- Google Scholar
2. Wang H, Yang Y, Liu B, Fujita H. A study of graph-based system for multi-view clustering. Knowl-Based Syst. 2019;163:1009–19.
- View Article
- Google Scholar
3. Huang Z, Zhou JT, Peng X, Zhang C, Zhu H, Lv J. Multi-view spectral clustering network. In: IJCAI. 2019. 4.
4. Shu Z, Sun T, Yu Z. Self-supervised disentangled representation learning with distribution alignment for multi-view clustering. Digit Signal Process. 2025;161:105078.
- View Article
- Google Scholar
5. Sun T, Shu Z, Huang Y, Wang H, Yu Z. Semantic feature graph consistency with contrastive cluster assignments for multilingual document clustering. ACM Trans Asian Low-Resour Lang Inf Process. 2025;24(1):1–22.
- View Article
- Google Scholar
6. Yang B, Zhang X, Chen B, Nie F, Lin Z, Nan Z. Efficient correntropy-based multi-view clustering with anchor graph embedding. Neural Netw. 2022;146:290–302. pmid:34915413
- View Article
- PubMed/NCBI
- Google Scholar
7. Shu Z, Yong K, Zhang D, Yu J, Yu Z, Wu X-J. Robust supervised matrix factorization hashing with application to cross-modal retrieval. Neural Comput Applic. 2022;35(9):6665–84.
- View Article
- Google Scholar
8. Wang H, Yang Y, Liu B. GMC: graph-based multi-view clustering. IEEE Trans Knowl Data Eng. 2020;32(6):1116–29.
- View Article
- Google Scholar
9. Liu Z, Qiu H, Deveci M, Pedrycz W, Siarry P. Multi-view neutrosophic c-means clustering algorithms. Expert Syst Appl. 2025;260:125454.
- View Article
- Google Scholar
10. Wang Y, Wu L, Lin X, Gao J. Multiview spectral clustering via structured low-rank matrix factorization. IEEE Trans Neural Netw Learn Syst. 2018;29(10):4833–43. pmid:29993958
- View Article
- PubMed/NCBI
- Google Scholar
11. Xia D, Yang Y, Yang S, Li T. Incomplete multi-view clustering via kernelized graph learning. Inf Sci. 2023;625:1–19.
- View Article
- Google Scholar
12. Zhao L, Wang Z, Yuan Y, Ding F. Unrestricted anchor graph based GCN for incomplete multi-view clustering. In: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2023. p. 1–5. https://doi.org/10.1109/icassp49357.2023.10096284
13. Wen J, Zhang Z, Xu Y, Zhong Z. Incomplete multi-view clustering via graph regularized matrix factorization. In: Proceedings of the European conference on computer vision (ECCV) workshops, 2018.
14. Li B, Shu Z, Liu Y, Mao C, Gao S, Yu Z. Multi-view clustering via label-embedded regularized NMF with dual-graph constraints. Neurocomputing. 2023;551:126521.
- View Article
- Google Scholar
15. Rai N, Negi S, Chaudhury S, Deshmukh O. Partial multi-view clustering using graph regularized NMF. In: 2016 23rd International Conference on Pattern Recognition (ICPR), 2016. p. 2192–7. https://doi.org/10.1109/icpr.2016.7899961
16. Wang H, Zong L, Liu B, Yang Y, Zhou W. Spectral perturbation meets incomplete multi-view data. 2019. https://arxiv.org/abs/1906.00098
- View Article
- Google Scholar
17. Liu X. Incomplete multiple kernel alignment maximization for clustering. IEEE Trans Pattern Anal Mach Intell. 2021.
- View Article
- Google Scholar
18. Wang Y, Chang D, Fu Z, Zhao Y. Consistent multiple graph embedding for multi-view clustering. IEEE Trans Multim. 2021.
- View Article
- Google Scholar
19. Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. arXiv preprint 2016.
- View Article
- Google Scholar
20. Wang Q, Ding Z, Tao Z, Gao Q, Fu Y. Generative partial multi-view clustering with adaptive fusion and cycle consistency. IEEE Trans Image Process. 2021;30:1771–83. pmid:33417549
- View Article
- PubMed/NCBI
- Google Scholar
21. Shu Z, Luo Y, Huang Y, Mao C, Yu Z. View-interactive attention information alignment-guided fusion for incomplete multi-view clustering. Exp Syst Appl. 2024;252:124258.
- View Article
- Google Scholar
22. Liu C, Wen J, Wu Z, Luo X, Huang C, Xu Y. Information recovery-driven deep incomplete multiview clustering network. IEEE Trans Neural Netw Learn Syst. 2023.
- View Article
- Google Scholar
23. Lin Y, Gou Y, Liu Z, Li B, Lv J, Peng X. Completer: incomplete multi-view clustering via contrastive prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021. p. 11174–83.
24. Choudhury SJ, Pal NR. Deep and structure-preserving autoencoders for clustering data with missing information. IEEE Trans Emerg Top Comput Intell. 2021;5(4):639–50.
- View Article
- Google Scholar
25. Fan S, Wang X, Shi C, Lu E, Lin K, Wang B. One2Multi graph autoencoder for multi-view graph clustering. In: Proceedings of the Web Conference 2020. 2020. p. 3070–6. https://doi.org/10.1145/3366423.3380079
26. Zhou R, Shen Y-D. End-to-end adversarial-attention network for multi-modal clustering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. p. 14619–28.
27. Xu C, Guan Z, Zhao W, Wu H, Niu Y, Ling B. Adversarial incomplete multi-view clustering. IJCAI. 2019;7:3933–9.
- View Article
- Google Scholar
28. Xu C, Liu H, Guan Z, Wu X, Tan J, Ling B. Adversarial incomplete multiview subspace clustering networks. IEEE Trans Cybern. 2022;52(10):10490–503. pmid:33750730
- View Article
- PubMed/NCBI
- Google Scholar
29. Lin Y, Gou Y, Liu X, Bai J, Lv J, Peng X. Dual contrastive prediction for incomplete multi-view representation learning. IEEE Trans Pattern Anal Mach Intell. 2023;45(4):4447–61. pmid:35939466
- View Article
- PubMed/NCBI
- Google Scholar
30. Zhe X, Li Y, Guan Z, Li W, Liang M, Zhou H. Robust multi-graph contrastive network for incomplete multi-view clustering. IEEE Trans Multim. 2024.
- View Article
- Google Scholar
31. Chen T, Kornblith S, Norouzi M, Hinton G. A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning. 2020. p. 1597–607.
32. Chuang C-Y, Robinson J, Lin Y-C, Torralba A, Jegelka S. Debiased contrastive learning. Adv Neural Inf Process Syst. 2020;33:8765–75.
- View Article
- Google Scholar
33. Tian Y, Sun C, Poole B, Krishnan D, Schmid C, Isola P. What makes for good views for contrastive learning? Adv Neural Inf Process Syst. 2020;33:6827–39.
- View Article
- Google Scholar
34. Hassani K, Khasahmadi AH. Contrastive multi-view representation learning on graphs. In: International Conference on Machine Learning. 2020. p. 4116–26.
35. Tian Y, Krishnan D, Isola P. Contrastive multiview coding. Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI. Springer. 2020. p. 776–94.
36. Trosten DJ, Lokse S, Jenssen R, Kampffmeyer M. Reconsidering representation alignment for multi-view clustering. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2021. p. 1255–65. https://doi.org/10.1109/cvpr46437.2021.00131
37. Xu J, Tang H, Ren Y, Peng L, Zhu X, He L. Multi-level feature learning for contrastive multi-view clustering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. p. 16051–60.
38. Vaswani A shish, Shazeer N, Parmar N, Uszkoreit J, Jones L lion, Gomez AN, et al. Attention is all you need. Adv Neural Inf Process Syst. 2017;30.
- View Article
- Google Scholar
39. Dosovitskiy A. An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint 2020. https://arxiv.org/abs/2010.11929
- View Article
- Google Scholar
40. Chu X, Tian Z, Wang Y, Zhang B, Ren H, Wei X, et al. Twins: revisiting the design of spatial attention in vision transformers. Adv Neural Inf Process Syst. 2021;34:9355–66.
- View Article
- Google Scholar
41. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, et al. Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021. p. 10012–22.
42. Zhao M, Yang W, Nie F. MVCformer: a transformer-based multi-view clustering method. Inf Sci. 2023;649:119622.
- View Article
- Google Scholar
43. Ke T-W, Hwang J-J, Guo Y, Wang X, Yu SX. Unsupervised hierarchical semantic segmentation with multiview cosegmentation and clustering transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. p. 2571–81.
44. Li S-Y, Jiang Y, Zhou Z-H. Partial multi-view clustering. AAAI. 2014;28(1).
- View Article
- Google Scholar
45. Hu M, Chen S. Doubly aligned incomplete multi-view clustering. arXiv preprint 2019. https://arxiv.org/abs/1903.02785
- View Article
- Google Scholar
46. Wen J, Zhang Z, Xu Y, Zhang B, Fei L, Liu H. Unified embedding alignment with missing views inferring for incomplete multi-view clustering. AAAI. 2019;33(01):5393–400.
- View Article
- Google Scholar
47. Wen J, Zhang Z, Zhang Z, Zhu L, Fei L, Zhang B, et al. Unified tensor framework for incomplete multi-view clustering and missing-view inferring. AAAI. 2021;35(11):10273–81.
- View Article
- Google Scholar
48. Liu J, Liu X, Zhang Y, Zhang P, Tu W, Wang S, et al. Self-representation subspace clustering for incomplete multi-view data. In: Proceedings of the 29th ACM International Conference on Multimedia. 2021. p. 2726–34. https://doi.org/10.1145/3474085.3475379
49. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial nets. Adv Neural Inf Process Syst. 2014;27.
- View Article
- Google Scholar
50. Wang S, Liu X, Liu L, Tu W, Zhu X, Liu J, et al. Highly-efficient incomplete large-scale multi-view clustering with consensus bipartite graph. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022. p. 9776–85.
51. Jin J, Wang S, Dong Z, Liu X, Zhu E. Deep incomplete multi-view clustering with cross-view partial sample and prototype alignment. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2023. p. 11600–9. https://doi.org/10.1109/cvpr52729.2023.01116
52. Deng S, Wen J, Liu C, Yan K, Xu G, Xu Y. Projective incomplete multi-view clustering. IEEE Trans Neural Netw Learn Syst. 2023.
- View Article
- Google Scholar
53. Xu J, Li C, Peng L, Ren Y, Shi X, Shen HT, et al. Adaptive feature projection with distribution alignment for deep incomplete multi-view clustering. IEEE Trans Image Process. 2023;32:1354–66. pmid:37022865
- View Article
- PubMed/NCBI
- Google Scholar
54. Lu Y, Lin Y, Yang M, Peng D, Hu P, Peng X. Decoupled contrastive multi-view clustering with high-order random walks. AAAI. 2024;38(13):14193–201.
- View Article
- Google Scholar
55. Liu S, Zhang J, Wen Y, Yang X, Wang S, Zhang Y, et al. Sample-level cross-view similarity learning for incomplete multi-view clustering. AAAI. 2024;38(12):14017–25.
- View Article
- Google Scholar
56. Xu G, Wen J, Liu C, Hu B, Liu Y, Fei L, et al. Deep variational incomplete multi-view clustering: exploring shared clustering structures. AAAI. 2024;38(14):16147–55.
- View Article
- Google Scholar
57. He C, Zhu H, Hu P, Peng X. Robust variational contrastive learning for partially view-unaligned clustering. In: Proceedings of the 32nd ACM International Conference on Multimedia. 2024. p. 4167–76. https://doi.org/10.1145/3664647.3681331

[ref1] 1. Huang S, Tsang IW, Xu Z, Lv J. Measuring diversity in graph learning: a unified framework for structured multi-view clustering. IEEE Trans Knowl Data Eng. 2022;34(12):5869–83.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Wang H, Yang Y, Liu B, Fujita H. A study of graph-based system for multi-view clustering. Knowl-Based Syst. 2019;163:1009–19.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Huang Z, Zhou JT, Peng X, Zhang C, Zhu H, Lv J. Multi-view spectral clustering network. In: IJCAI. 2019. 4.

[ref4] 4. Shu Z, Sun T, Yu Z. Self-supervised disentangled representation learning with distribution alignment for multi-view clustering. Digit Signal Process. 2025;161:105078.
View Article
Google Scholar

[9] View Article

[10] Google Scholar

[ref5] 5. Sun T, Shu Z, Huang Y, Wang H, Yu Z. Semantic feature graph consistency with contrastive cluster assignments for multilingual document clustering. ACM Trans Asian Low-Resour Lang Inf Process. 2025;24(1):1–22.
View Article
Google Scholar

[12] View Article

[13] Google Scholar

[ref6] 6. Yang B, Zhang X, Chen B, Nie F, Lin Z, Nan Z. Efficient correntropy-based multi-view clustering with anchor graph embedding. Neural Netw. 2022;146:290–302. pmid:34915413
View Article
PubMed/NCBI
Google Scholar

[15] View Article

[16] PubMed/NCBI

[17] Google Scholar

[ref7] 7. Shu Z, Yong K, Zhang D, Yu J, Yu Z, Wu X-J. Robust supervised matrix factorization hashing with application to cross-modal retrieval. Neural Comput Applic. 2022;35(9):6665–84.
View Article
Google Scholar

[19] View Article

[20] Google Scholar

[ref8] 8. Wang H, Yang Y, Liu B. GMC: graph-based multi-view clustering. IEEE Trans Knowl Data Eng. 2020;32(6):1116–29.
View Article
Google Scholar

[22] View Article

[23] Google Scholar

[ref9] 9. Liu Z, Qiu H, Deveci M, Pedrycz W, Siarry P. Multi-view neutrosophic c-means clustering algorithms. Expert Syst Appl. 2025;260:125454.
View Article
Google Scholar

[25] View Article

[26] Google Scholar

[ref10] 10. Wang Y, Wu L, Lin X, Gao J. Multiview spectral clustering via structured low-rank matrix factorization. IEEE Trans Neural Netw Learn Syst. 2018;29(10):4833–43. pmid:29993958
View Article
PubMed/NCBI
Google Scholar

[28] View Article

[29] PubMed/NCBI

[30] Google Scholar

[ref11] 11. Xia D, Yang Y, Yang S, Li T. Incomplete multi-view clustering via kernelized graph learning. Inf Sci. 2023;625:1–19.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref12] 12. Zhao L, Wang Z, Yuan Y, Ding F. Unrestricted anchor graph based GCN for incomplete multi-view clustering. In: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2023. p. 1–5. https://doi.org/10.1109/icassp49357.2023.10096284

[ref13] 13. Wen J, Zhang Z, Xu Y, Zhong Z. Incomplete multi-view clustering via graph regularized matrix factorization. In: Proceedings of the European conference on computer vision (ECCV) workshops, 2018.

[ref14] 14. Li B, Shu Z, Liu Y, Mao C, Gao S, Yu Z. Multi-view clustering via label-embedded regularized NMF with dual-graph constraints. Neurocomputing. 2023;551:126521.
View Article
Google Scholar

[37] View Article

[38] Google Scholar

[ref15] 15. Rai N, Negi S, Chaudhury S, Deshmukh O. Partial multi-view clustering using graph regularized NMF. In: 2016 23rd International Conference on Pattern Recognition (ICPR), 2016. p. 2192–7. https://doi.org/10.1109/icpr.2016.7899961

[ref16] 16. Wang H, Zong L, Liu B, Yang Y, Zhou W. Spectral perturbation meets incomplete multi-view data. 2019. https://arxiv.org/abs/1906.00098
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref17] 17. Liu X. Incomplete multiple kernel alignment maximization for clustering. IEEE Trans Pattern Anal Mach Intell. 2021.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref18] 18. Wang Y, Chang D, Fu Z, Zhao Y. Consistent multiple graph embedding for multi-view clustering. IEEE Trans Multim. 2021.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref19] 19. Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. arXiv preprint 2016.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref20] 20. Wang Q, Ding Z, Tao Z, Gao Q, Fu Y. Generative partial multi-view clustering with adaptive fusion and cycle consistency. IEEE Trans Image Process. 2021;30:1771–83. pmid:33417549
View Article
PubMed/NCBI
Google Scholar

[53] View Article

[54] PubMed/NCBI

[55] Google Scholar

[ref21] 21. Shu Z, Luo Y, Huang Y, Mao C, Yu Z. View-interactive attention information alignment-guided fusion for incomplete multi-view clustering. Exp Syst Appl. 2024;252:124258.
View Article
Google Scholar

[57] View Article

[58] Google Scholar

[ref22] 22. Liu C, Wen J, Wu Z, Luo X, Huang C, Xu Y. Information recovery-driven deep incomplete multiview clustering network. IEEE Trans Neural Netw Learn Syst. 2023.
View Article
Google Scholar

[60] View Article

[61] Google Scholar

[ref23] 23. Lin Y, Gou Y, Liu Z, Li B, Lv J, Peng X. Completer: incomplete multi-view clustering via contrastive prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021. p. 11174–83.

[ref24] 24. Choudhury SJ, Pal NR. Deep and structure-preserving autoencoders for clustering data with missing information. IEEE Trans Emerg Top Comput Intell. 2021;5(4):639–50.
View Article
Google Scholar

[64] View Article

[65] Google Scholar

[ref25] 25. Fan S, Wang X, Shi C, Lu E, Lin K, Wang B. One2Multi graph autoencoder for multi-view graph clustering. In: Proceedings of the Web Conference 2020. 2020. p. 3070–6. https://doi.org/10.1145/3366423.3380079

[ref26] 26. Zhou R, Shen Y-D. End-to-end adversarial-attention network for multi-modal clustering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. p. 14619–28.

[ref27] 27. Xu C, Guan Z, Zhao W, Wu H, Niu Y, Ling B. Adversarial incomplete multi-view clustering. IJCAI. 2019;7:3933–9.
View Article
Google Scholar

[69] View Article

[70] Google Scholar

[ref28] 28. Xu C, Liu H, Guan Z, Wu X, Tan J, Ling B. Adversarial incomplete multiview subspace clustering networks. IEEE Trans Cybern. 2022;52(10):10490–503. pmid:33750730
View Article
PubMed/NCBI
Google Scholar

[72] View Article

[73] PubMed/NCBI

[74] Google Scholar

[ref29] 29. Lin Y, Gou Y, Liu X, Bai J, Lv J, Peng X. Dual contrastive prediction for incomplete multi-view representation learning. IEEE Trans Pattern Anal Mach Intell. 2023;45(4):4447–61. pmid:35939466
View Article
PubMed/NCBI
Google Scholar

[76] View Article

[77] PubMed/NCBI

[78] Google Scholar

[ref30] 30. Zhe X, Li Y, Guan Z, Li W, Liang M, Zhou H. Robust multi-graph contrastive network for incomplete multi-view clustering. IEEE Trans Multim. 2024.
View Article
Google Scholar

[80] View Article

[81] Google Scholar

[ref31] 31. Chen T, Kornblith S, Norouzi M, Hinton G. A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning. 2020. p. 1597–607.

[ref32] 32. Chuang C-Y, Robinson J, Lin Y-C, Torralba A, Jegelka S. Debiased contrastive learning. Adv Neural Inf Process Syst. 2020;33:8765–75.
View Article
Google Scholar

[84] View Article

[85] Google Scholar

[ref33] 33. Tian Y, Sun C, Poole B, Krishnan D, Schmid C, Isola P. What makes for good views for contrastive learning? Adv Neural Inf Process Syst. 2020;33:6827–39.
View Article
Google Scholar

[87] View Article

[88] Google Scholar

[ref34] 34. Hassani K, Khasahmadi AH. Contrastive multi-view representation learning on graphs. In: International Conference on Machine Learning. 2020. p. 4116–26.

[ref35] 35. Tian Y, Krishnan D, Isola P. Contrastive multiview coding. Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI. Springer. 2020. p. 776–94.

[ref36] 36. Trosten DJ, Lokse S, Jenssen R, Kampffmeyer M. Reconsidering representation alignment for multi-view clustering. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2021. p. 1255–65. https://doi.org/10.1109/cvpr46437.2021.00131

[ref37] 37. Xu J, Tang H, Ren Y, Peng L, Zhu X, He L. Multi-level feature learning for contrastive multi-view clustering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. p. 16051–60.

[ref38] 38. Vaswani A shish, Shazeer N, Parmar N, Uszkoreit J, Jones L lion, Gomez AN, et al. Attention is all you need. Adv Neural Inf Process Syst. 2017;30.
View Article
Google Scholar

[94] View Article

[95] Google Scholar

[ref39] 39. Dosovitskiy A. An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint 2020. https://arxiv.org/abs/2010.11929
View Article
Google Scholar

[97] View Article

[98] Google Scholar

[ref40] 40. Chu X, Tian Z, Wang Y, Zhang B, Ren H, Wei X, et al. Twins: revisiting the design of spatial attention in vision transformers. Adv Neural Inf Process Syst. 2021;34:9355–66.
View Article
Google Scholar

[100] View Article

[101] Google Scholar

[ref41] 41. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, et al. Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021. p. 10012–22.

[ref42] 42. Zhao M, Yang W, Nie F. MVCformer: a transformer-based multi-view clustering method. Inf Sci. 2023;649:119622.
View Article
Google Scholar

[104] View Article

[105] Google Scholar

[ref43] 43. Ke T-W, Hwang J-J, Guo Y, Wang X, Yu SX. Unsupervised hierarchical semantic segmentation with multiview cosegmentation and clustering transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. p. 2571–81.

[ref44] 44. Li S-Y, Jiang Y, Zhou Z-H. Partial multi-view clustering. AAAI. 2014;28(1).
View Article
Google Scholar

[108] View Article

[109] Google Scholar

[ref45] 45. Hu M, Chen S. Doubly aligned incomplete multi-view clustering. arXiv preprint 2019. https://arxiv.org/abs/1903.02785
View Article
Google Scholar

[111] View Article

[112] Google Scholar

[ref46] 46. Wen J, Zhang Z, Xu Y, Zhang B, Fei L, Liu H. Unified embedding alignment with missing views inferring for incomplete multi-view clustering. AAAI. 2019;33(01):5393–400.
View Article
Google Scholar

[114] View Article

[115] Google Scholar

[ref47] 47. Wen J, Zhang Z, Zhang Z, Zhu L, Fei L, Zhang B, et al. Unified tensor framework for incomplete multi-view clustering and missing-view inferring. AAAI. 2021;35(11):10273–81.
View Article
Google Scholar

[117] View Article

[118] Google Scholar

[ref48] 48. Liu J, Liu X, Zhang Y, Zhang P, Tu W, Wang S, et al. Self-representation subspace clustering for incomplete multi-view data. In: Proceedings of the 29th ACM International Conference on Multimedia. 2021. p. 2726–34. https://doi.org/10.1145/3474085.3475379

[ref49] 49. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial nets. Adv Neural Inf Process Syst. 2014;27.
View Article
Google Scholar

[121] View Article

[122] Google Scholar

[ref50] 50. Wang S, Liu X, Liu L, Tu W, Zhu X, Liu J, et al. Highly-efficient incomplete large-scale multi-view clustering with consensus bipartite graph. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022. p. 9776–85.

[ref51] 51. Jin J, Wang S, Dong Z, Liu X, Zhu E. Deep incomplete multi-view clustering with cross-view partial sample and prototype alignment. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2023. p. 11600–9. https://doi.org/10.1109/cvpr52729.2023.01116

[ref52] 52. Deng S, Wen J, Liu C, Yan K, Xu G, Xu Y. Projective incomplete multi-view clustering. IEEE Trans Neural Netw Learn Syst. 2023.
View Article
Google Scholar

[126] View Article

[127] Google Scholar

[ref53] 53. Xu J, Li C, Peng L, Ren Y, Shi X, Shen HT, et al. Adaptive feature projection with distribution alignment for deep incomplete multi-view clustering. IEEE Trans Image Process. 2023;32:1354–66. pmid:37022865
View Article
PubMed/NCBI
Google Scholar

[129] View Article

[130] PubMed/NCBI

[131] Google Scholar

[ref54] 54. Lu Y, Lin Y, Yang M, Peng D, Hu P, Peng X. Decoupled contrastive multi-view clustering with high-order random walks. AAAI. 2024;38(13):14193–201.
View Article
Google Scholar

[133] View Article

[134] Google Scholar

[ref55] 55. Liu S, Zhang J, Wen Y, Yang X, Wang S, Zhang Y, et al. Sample-level cross-view similarity learning for incomplete multi-view clustering. AAAI. 2024;38(12):14017–25.
View Article
Google Scholar

[136] View Article

[137] Google Scholar

[ref56] 56. Xu G, Wen J, Liu C, Hu B, Liu Y, Fei L, et al. Deep variational incomplete multi-view clustering: exploring shared clustering structures. AAAI. 2024;38(14):16147–55.
View Article
Google Scholar

[139] View Article

[140] Google Scholar

[ref57] 57. He C, Zhu H, Hu P, Peng X. Robust variational contrastive learning for partially view-unaligned clustering. In: Proceedings of the 32nd ACM International Conference on Multimedia. 2024. p. 4167–76. https://doi.org/10.1145/3664647.3681331

Figures

Abstract

1 Introduction

2 Related work

2.1 Contrastive learning-based MVC

2.2 Transformer-based MVC

2.3 Incomplete multi-view clustering (IMVC)

3 Method

3.1 Notations

3.2 Overall framework

3.3 Information interaction transformer

3.4 Soft-label collaborative imputation

3.5 Soft-label and view consistency enhancement

4 Experimental results and analysis

4.1 Datasets and metrics

4.2 Comparison methods

4.3 Implementation details

4.4 Experimental results

4.5 Ablation study

4.6 Convergence analysis

4.7 Parameter analysis

4.8 Visualization

4.9 Complexity analysis

5 Conclusion

References