SMILE: Semi-supervised multi-view classification based on dynamical fusion

Hui Yang; Linyan Kang; Xun Che

doi:10.1371/journal.pone.0320831

Abstract

Semi-supervised multi-view classification plays a crucial role in understanding and utilizing existing multi-view data, especially in domains like medical diagnosis and autonomous driving. However, conventional semi-supervised multi-view classification methods often merely fuse features from multiple views without significantly improving classification performance. To address this issue, we propose a dynamic fusion approach for Semi-supervised Mult I-view c Lassification (SMILE). This approach leverages a high-level semantic mapping module to extract discriminative features from each view, reducing redundancy features. Furthermore, it introduces a dynamic fusion module to assess the quality of different views of different samples dynamically, diminishing the negative impact of low-quality views. We compare our method with six competitive methods on four datasets, exhibiting distinct advantages on the classification task, which demonstrates significant performance improvements across various evaluation metrics. Visualization experiments demonstrate that our approach is able to learn classification-friendly representations.

Citation: Yang H, Kang L, Che X (2025) SMILE: Semi-supervised multi-view classification based on dynamical fusion. PLoS One 20(5): e0320831. https://doi.org/10.1371/journal.pone.0320831

Editor: Hongchuan Yu, Bournemouth University, UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND

Received: September 27, 2024; Accepted: February 25, 2025; Published: May 20, 2025

Copyright: © 2025 Yang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

1 Introduction

With the development of multimedia technology, most real-life data exists in the form of multi-view/multi-modality. For example, during autonomous driving, different sensors perceive the surrounding environment, such as ultrasonic radar, cameras, millimeter-wave radar, etc., and the data collected by each sensor is regarded as a view [1–5]. A video consists of audio, images, and text, with each medium acting as a view [6]. Fully understanding and utilizing these multi-view data can better mine data and drive innovation and progress. However, with the continuous increase in manual annotation costs and massive data, there is an urgent need for a more effective way to process multi-view data [7–10]. Therefore, this work focuses on semi-supervised multi-view classification.

The main challenge to semi-supervised multi-view classification is: how to fully utilize a small amount of labeled data and a large amount of unlabeled data to obtain a complete (containing cross-view shared information and complementary information) multi-view representation. Existing semi-supervised multi-view classification methods can be mainly divided into traditional methods and deep learning-based methods [11, 12]. Traditional methods mostly rely on the shared representation after multi-view fusion. Due to the limited ability of shallow methods to extract high-level semantic information from data, their classification performance depends highly on the original data. With the rapid development of deep learning, deep semi-supervised multi-view classification methods utilize the powerful representation ability of deep models to learn multi-view fusion representations that are beneficial for classification, thus overcoming the shortcomings of traditional methods and making deep learning-based semi-supervised multi-view classification attract tremendous attention in the community [13, 14].

Existing deep learning-based semi-supervised multi-view classification methods usually utilize autoencoders or convolutional neural networks to extract features from multiple views and maximize the shared representation to obtain fusion representations, and then use semi-supervised strategies based on the fused representations to generate supervised information for unlabeled data [15, 16]. Although these methods have made some progress, they simply concatenate the features of multiple views or obtain fusion features through a neural network, and cannot effectively estimate the contributions of different views. Multi-view data is composed of data from different sources, and it contains not only shared information but also a large amount of specific view information. The informativeness of each view in different samples is different. Therefore, dynamically fusing different views for each sample is beneficial to improve the quality of the fusion representations, thereby improving the classification performance of the model.

To address this issue, we propose a novel method called Semi-supervised Mult I-view c Lassification based on dynamic fusion (SMILE). The method introduces a view-specific autoencoder (AE) for each view to extract low-level features. Since these low-level features are heavily tied to the reconstruction task and are not ideal for classification, we incorporate a high-level semantic mapping head for each view to transform these features into more suitable high-level features for classification. Additionally, a view confidence module evaluates the informativeness of each view for different samples, allowing for the dynamic fusion of views. This approach mitigates the negative impact of low-quality views, ensuring the model remains robust to variations in view quality. The main contributions of this work are summarized as follows:

This work proposes a novel algorithm for semi-supervised multi-view classification that is robust to low-quality views and significantly improves classification performance.
The introduced view confidence module dynamically evaluates the informativeness of each view of each sample, refines discriminative features and dynamically fuses features of multiple views.
It is proved through visualization experiments that the proposed method can learn fusion representations with clear classification structure; ablation experiments prove the importance of the dynamic fusion module and the high-level feature mapping head; classification performance comparison experiments with 6 competitive methods on 4 benchmark datasets show the effectiveness and superiority of the proposed method.

2 Related work

Most data in real life exists in the form of multi-view/multimodal data. For example, a video contains three modalities: audio, image, and text; data collected by different sensors in autonomous driving constitute multiple modalities; multiple views in medical diagnosis are formed by different medical images (such as X-ray, MRI, and CT scans) of the same lesion [17–21]. Multi-view data contains richer information than single-view data and fully exploring multi-view data can better mine the knowledge behind the data, thereby it can promote social development and progress [22, 23]. However, with the explosive growth of data, existing data commonly consists of numerous unlabeled data and merely a small amount of labeled data. Due to the high cost of manual labeling, how to fully utilize the small amount of labeled data and the numerous unlabeled data has become a hot research topic and difficulty in the current research [24–27]. In the face of the dual challenges of multi-view and few labels, there is an urgent need to propose a more effective method, so semi-supervised multi-view learning has emerged and become a research focus.

Existing semi-supervised multi-view learning roughly category two groups one is traditional methods and the other is deep learning-based methods. Traditional methods primarily fall into the following categories: (1) co-training techniques [19, 28, 29]; (2) graph-based strategies [30–32]; and (3) regression-based approaches [33–35]. Co-training, initially developed for dual-view data, starts by training a classifier on labeled data. It then assigns labels to the unlabeled data in each view. Subsequently, the most confidently predicted samples from one classifier are added to the training set of the other classifier, and this cycle is repeated [36]. Graph-based methods use unlabeled and labeled data as vertices of the consensus graph, and then propagate the labeling information using edges. For example, these methods [31, 32] initially construct a graph for each view, then learn the weights of the views to obtain a consensus graph, and use label propagation to predict labels for unlabeled data. In line with this, Nie et al. [31] propose a parameter-free method to simultaneously learn the consensus graph matrix, the commonly labeled indication matrix, and the view weights. Regression-based learning techniques are employed to derive projection matrices for each view, thereby exploring view-specific complementary information. They utilize the labeled matrix as a cross-view regression target to investigate consistency among views [33]. While these methods demonstrate significant advancements, they are not without their limitations. Co-training approaches tend to overlook the inherent diversity of multiple views, treating them as equivalent. This oversight not only fails to rectify the inaccuracies produced by low-quality views but also exacerbates these errors within the model, ultimately resulting in a decline in performance. Furthermore, these methods are constrained in their ability to perform representation learning and are unable to effectively explore high-level semantic information within the data. In contrast, deep learning-based approaches excel outstanding in representation learning have garnered significant attention [6, 37, 38].

Pseudo-labeling methods demonstrate superior performance compared to many other techniques in depth-based semi-supervised learning due to their lightweight and effective approach. This advantage may be attributed to the substantial number of parameters that require tuning in deep neural networks; an increased number of parameters necessitates a greater volume of labeled data. Pseudo-labeling methods provide labeled data directly, which is particularly advantageous for deep learning models [25, 27]. For example, Wang et al. [1] proposed generating pseudo-labels on the fused representation of multiple views as supervised information to guide the learning of the single-view representation. With more supervised information, the learning of representations for individual views is thus improved thus facilitating the generation of better-fused representations. It follows that the quality of pseudo-labeling depends on the quality of the input representations. However, these methods typically employ straightforward fusion techniques to integrate features from multiple views, neglecting the variability in informativeness across different samples. This oversight diminishes the distinguishability of the fused representations. Consequently, the representations generated by existing methods fail to mitigate the adverse effects of low-quality views, resulting in a limited acquisition of discriminative features. This limitation ultimately constrains the enhancement of classification performance.

Unlike these approaches, this study dynamically evaluates the informativeness of different views for different samples and then dynamically fuses the features of multiple views to generate classification-friendly representations.

3 Our method

3.1 Prelimilary

In this section, the proposed semi-supervised multi-view classification algorithm based on dynamic fusion is presented. For ease of exposition, this section begins with a description of the notation involved. Given a multi-view dataset , where labeled data is defined as , where V denotes the number of views, y is the label corresponding to the current sample, and L is the number of labeled data. Unlabeled data is defined as , where U is the number of unlabeled data. denotes the feature dimension of the v-view of the i-th sample is d_m. The purpose of semi-supervised multi-view classification is to predict correct classification results for unlabeled samples using a small amount of labeled data and a large amount of unlabeled data. The framework of the proposed method in this work is shown in Fig 1.

Download:

Fig 1. Illustrated of our proposed SMILE,

which pipeline as follows: ( a) input multi-view data; ( b) obtain view-specific representations through view-specific autoencoders; ( c) high-level semantic projection module projects view-specific representations into high-level features; ( d) dynamic fuse multiple view-specific high-level features.

https://doi.org/10.1371/journal.pone.0320831.g001

3.2 Multi-view data reconstruction

The raw multiview data has a large number of redundant features, so representative features are first learned from the raw data. AutoEncoder is a model that maps raw data to feature space and is widely used because of its simplicity and effectiveness [39, 40]. Therefore, in this work, view-specific AutoEncoders are designed for each view to extract the features of each view. Specifically, for the v-view of sample i, is introduced as an encoder nonlinear function to map each view into view-specific representations:

(1)

where is the obtained low-level features that the decoder will reconstruct to obtain the reconstructed view. Specifically, the decoder is denoted by and the obtained reconstructed view is denoted as:

(2)

This work utilizes reconstruction loss to optimize this process, which defined as:

(3)

3.3 Dynamic multi-view fusion

The low-level features of a single view extracted by the autoencoder, which contains most of the features relevant for reconstruction, will face some challenges if the classification is performed directly on the low-level feature space. To obtain more class-specific features to facilitate the performance improvement of classification, we add an additional fully connected layer on the low-level features to map them to the high-level representation space , is the obtained high-level features, and is the high-level semantic projection module of the v-view.

In multi-view data, the informativeness of each view of different samples is invariant [41, 42], therefore, understanding the variation of the informativeness for different samples is the key to multi-view classification, which is related to the ability of the model to adapt to the quality of the modality changes. Inspired by the literature [43], this work introduces the True-Class-Probability (TCP) [44] to quantify the categorization confidence of different views, which is closely related to the amount of categorization information of the views. When the current view classification confidence is low, it indicates that the current classification is unreliable, which correspondingly means that the view contains less informative information. In order to obtain the classification confidence of the views, for each view v, this work designs a classifier as a probabilistic model that transforms the observed samples into a predictive distribution , C is the number of classes. The classifier can be trained using the maximum likelihood estimation framework to minimize the KL (Kullback-Leibler) divergence between the predicted distribution and the true distribution:

(4)

where Eq. 4 is also known as the cross-entropy function. The maximum class probability can be extrapolated to , and can also be considered as, the classifier’s confidence in the current prediction. Although this is effective in classification, its tendency is to lead to overconfidence in the model (assigning higher confidence scores for error predictions as well). Therefore, in order to obtain more reliable classification confidence, TCP is used in this work. Unlike MCP which utilizes the maximum softmax output as a measure of confidence, TCP uses the Softmax output probability that corresponds to the true label as its confidence. Specifically, for each view v, the corresponding prediction distribution and label y is obtained, then the TCP can be formalized as:

(5)

where denotes the inner product, and when the model predictions are correct, the output of TCP agrees with the output of MCP. At this point, both TCP and MCP are maximal Softmax outputs and both reflect classification confidence well. When misclassified, however, the TCP is a better reflection of classification because it is more likely to approach a lower value, reflecting the fact that the model tends to make incorrect predictions. Although TCP gives more reliable confidence, it cannot be used directly in the estimation of the test phase and unlabeled samples because of the need for real labeling information, so a confidence regression network is introduced for each view v to estimate TCP because , therefore, a sigmoid activation function is added to the last layer of the network and the confidence regression network is trained with loss:

(6)

where , then the TCP can be approximated with a view-specific classifier and a confidence regression network. Thus the fusion representation of multiple views is shown below:

(7)

where denotes a concatenation operator.

3.4 Objective function

In this work, for the fusion representation, an additional classifier is trained with cross-entropy loss to get the final classification result p, therefore, the supervised loss in this work is formulated as follows:

(8)

For unlabeled samples, this work employs a threshold-based method to evaluate the reliability of predicted pseudo-labels and selects trustworthy pseudo-labels for training. The unsupervised loss can be defined as the cross-entropy loss between the pseudo-labels and the model predictions:

(9)

where the pseudo-labels with a maximum class prediction probability equal to or exceeding the threshold are deemed credible, and is a predefined hyperparameter. denotes the cross-entropy loss. Thus the objective function of the proposed method in this work can be defined as:

(10)

where is the balance factor to balance different losses, which is set to 1 in the experiments in this work.

4 Experimental setup

4.1 Datasets

Handwritten (https://archive.ics.uci.edu/ml/datasets/Multiple+Features) is a handwritten digits dataset comprises 2,000 samples, categorized into 10 classes, and includes six different views.
Scene15 [45] is a dataset that consists of images categorized into 15 indoor and outdoor scene categories. In this work, we employ GIST, PJOG, and LBP features, utilizing these three views with a total of 4,485 samples to construct the dataset.
Out-Scene [46] dataset is specifically designed for scene classification tasks and comprises 2,688 outdoor scene images, which are categorized into eight classes. Each sample within the dataset includes four views.
GRAZ02 (http://www.emt.tugraz.at/pinz/data/GRAZ_02) is a widely utilized benchmark for object categorization and recognition tasks, consisting of 1,474 samples across four classes, with each sample encompassing six views.

4.2 Comparison methods

This study undertakes comparative experiments to assess and compare the proposed method against six state-of-the-art semi-supervised multi-view classification methods.

AMGL [31] is developed for multiview clustering and semi-supervised tasks, enabling the automatic learning of optimal graph weights without the need for additional parameters. This approach incorporates heterogeneous features to align with actual data distributions and ensures the attainment of a globally optimal solution.
MLAN [32] concurrently executes clustering or semi-supervised classification alongside local structure learning. The model autonomously determines optimal weights for each view without the need for explicit weight specifications or penalty parameters. Furthermore, it is capable of producing reliable graphs even in the presence of noisy data.
MVAR [33] employs regression-based loss functions that utilize the matrix norm, integrating them in a linear fashion. It features an efficient and convergent algorithm designed for the minimization of the non-smooth -norm, rendering it appropriate for large-scale datasets. Furthermore, MVAR automatically adjusts weights to accommodate low-quality views and streamlines the prediction process for new data.
JCD [30] learns both a common label matrix and view-specific classifiers. It proposes a novel probabilistic square hinge loss to handle uncertain sample contributions and uses power mean to weight losses from different views.
LACK [47] presents a label-driven auto-weighted approach that assesses the significance of views through labeling rather than through data representation. This methodology enables LACK to acquire labels with enhanced accuracy in view weights by decomposing the overarching problem into three smaller, more manageable sub-problems that can be solved efficiently.
IMvGCN [48] integrates Graph Convolutional Networks (GCN) with multi-view learning to enhance interpretability and performance. It combines reconstruction error and Laplacian embedding to address multi-view learning from both feature and topology perspectives.
SMILE-L is our proposed method that the model just trained with unlabeled data.

4.2.1 Experimental details.

In the experiments, the view-specific feature extraction network is implemented as a 3-layer Multilayer Perceptron (MLP). We use the Adam optimizer with weight decay to adjust the learnable parameters, setting the learning rate to 1e–3. The balancing parameter is selected from , and the threshold is fixed at 0.95. The proposed framework is implemented using the PyTorch platform. The experiments were conducted on a computer equipped with an Intel i9-13900HX CPU, an Nvidia GeForce RTX 4060 GPU, and 32 GB of RAM. To evaluate performance, we use classification accuracy (ACC), macro F1-score (F1), and area under the curve (AUC). Higher values in these metrics indicate better performance.

4.3 Experimental results and analysis

4.3.1 Classification results and display.

In this experiment, to assess the classification performance of our method, this work compares our method with six competitive semi-supervised multi-view classification methods on four benchmark datasets. The proportion of labeled samples is set to 5%, 10%, and 15%. The experimental results ACC, F1, and AUC scores are recorded in Tables 1–3, respectively. For ease of observation, the best results are bolded in this work. Based on the experimental results, the following points can be observed:

(1) The results for ACC and F1 scores indicate that while traditional and deep methods each have their strengths on different datasets, the proposed method consistently delivers superior performance across nearly all datasets. For instance, on the Scene15 dataset, which has only 5% labeled samples, our method surpasses the second-best approach by 5.07% in ACC and 5.61% in F1. This outstanding performance is attributed to the dynamic fusion strategy in the high-level semantic space, which minimizes conflicts between reconstructed and category-specific features and reduces the adverse effects of low-quality views.
(2) From the results of AUC in Table 3, it is found that the method in this work achieves the optimal performance on all datasets, indicating that the proposed method in this work has good robustness and better discriminative ability, and the learned representations have a clear classification structure.
(3) SMILE-L demonstrates superior classification performance compared to the six comparison methods across almost all datasets, highlighting its ability to effectively extract task-related features while reducing redundant features. Furthermore, our method, SMILE, outperforms SMILE-L, underscoring the necessity and effectiveness of training the model with unlabeled data. This improvement stems from the inclusion of unlabeled data, which provides valuable information. Leveraging this information allows the model to gain a more comprehensive understanding of the data distribution, thereby enhancing the performance of downstream tasks.

Download:

Table 1. Accuracy results (%) compared among methods, the LACK algorithm cannot work on the GRAZ02 dataset and is replaced with “—”. The best results are highlighted in bold.

https://doi.org/10.1371/journal.pone.0320831.t001

Download:

Table 2. F1 results (%) compared among methods, the LACK algorithm cannot work on the GRAZ02 dataset and is replaced with “—”. The best results are highlighted in bold.

https://doi.org/10.1371/journal.pone.0320831.t002

Download:

Table 3. AUC results (%) compared among methods, the LACK algorithm cannot work on the GRAZ02 dataset and is replaced with “—”. The best results are highlighted in bold.

https://doi.org/10.1371/journal.pone.0320831.t003

4.3.2 Visualization analysis.

In order to demonstrate the fusion representations learned in this work more intuitively, this work visualizes the original experimental dataset features and the fusion representations learned by this method using the t-SNE [49] and UMAP [49] dimensionality reduction methods. t-SNE focuses more on the local similarity, while UMAP focuses more on maintaining the global structure. The results of t-SNE visualization for the original view and the fused representation are shown in Fig 2 and Fig 4, respectively. The UMAP visualization results for the original view and the fused representations are shown in Figs 3 and 5, respectively.

Download:

Fig 2. The t-SNE visualization of the original data.

https://doi.org/10.1371/journal.pone.0320831.g002

Download:

Fig 3. The t-SNE visualization of the fused features extracted in this work.

https://doi.org/10.1371/journal.pone.0320831.g003

Download:

Fig 4. The UMAP visualization of the original data.

https://doi.org/10.1371/journal.pone.0320831.g004

Download:

Fig 5. The UMAP visualization of the fused features extracted in this work.

https://doi.org/10.1371/journal.pone.0320831.g005

Observation of the visualization results reveals that the raw data are distributed in a complex manner, with confusion between classes, making it difficult to clearly distinguish between different classes, especially on the two datasets Scene15 and GRAZ02. This suggests that the original data may have more overlapping and mixing in the high-dimensional space. The learned fusion representation, however, possesses obvious inter-class separation and intra-class compactness, indicating that our method is able to better distinguish different classes in the low-dimensional representation of the data, and has achieved significant improvement in data representation.

4.3.3 Ablation study.

In this experiment, the work provides a further analysis of the importance of the high-level semantic mapping module and the dynamic fusion module used, in order to explore their key role in the model performance. The results of the ablation experiments are reported in Tables 4 and 5 to clearly demonstrate the impact of these two modules on the classification performance. The following conclusions can be drawn from the experimental results: Both modules play an active role in improving the classification performance of the model, indicating that the high-level semantic mapping module facilitates the extraction of classification features and avoids the influence of redundant features, while the dynamic fusion module avoids the influence of low-quality views and is able to deal well with the correlation and weight assignment between features of different views of different samples. The experimental results demonstrate the importance and effectiveness of these two modules.

Download:

Table 4. Ablation study of “w/” or “w/o” high-level semantic mapping module.

https://doi.org/10.1371/journal.pone.0320831.t004

Download:

Table 5. Ablation study of “w/” or “w/o” dynamic fusion module.

https://doi.org/10.1371/journal.pone.0320831.t005

Discussion

The proposed method achieves dynamic fusion in multi-view settings under semi-supervised scenarios, which helps mitigate performance degradation caused by low-quality views. However, we acknowledge a key limitation of this work: the reliance on pseudo-label quality and the approximation accuracy of the true class probability (TCP). While pseudo-labeling is employed to address the scarcity of labeled data, the accuracy of these pseudo labels is inherently difficult to evaluate in the absence of ground truth. In extreme cases of low-quality views, the method may produce more inaccurate predictions, which could further compromise the TCP approximation and, in turn, hinder the extraction of view-specific features. Addressing the challenge of evaluating and improving pseudo-label quality remains an open and complex problem, and it represents a key direction for our future research efforts.

Conclusion

With advancements in multimedia technology, the prevalence of multi-view data has increased, offering richer information for analysis and understanding while also presenting several challenges. Issues such as limited labeling information, effective multi-view fusion, and discriminative feature extraction need to be addressed. This work introduces a semi-supervised multi-view classification method based on dynamic fusion, which excels in extracting discriminative features and dynamically fusing multiple views from various samples. The high-level semantic mapping module reduces the impact of redundant features and retains important classification-related features, while the dynamic fusion module assigns weights to different views for each sample, minimizing the effects of noisy and low-quality views and exploring view associations effectively. Quantitative experiments validate the algorithm’s effectiveness and superiority, and visualization experiments demonstrate that the learned fusion features have a strong classification structure. Ablation studies highlight the importance and effectiveness of each module. Future work will focus on improving semi-supervised multi-view classification, enhancing pseudo-labeling accuracy, discovering more discriminative features, and achieving better fusion of multiple views for optimal classification representations.

Acknowledgments

We thank all reviewers for their valuable suggestion on this paper.

References

1. Wang X, Fu L, Zhang Y, Wang Y, Li Z. MMatch: semi-supervised discriminative representation learning for multi-view classification. IEEE Trans Circuits Syst Video Technol. 2022;1–1.
- View Article
- Google Scholar
2. Wang X, Wang Y, Wang Y, Huang A, Liu J. Trusted semi-supervised multi-view classification with contrastive learning. IEEE Trans Multimedia. 2024;26:8268–78.
- View Article
- Google Scholar
3. Tian Y, Sun S, Tang J. Multi-view teacher–student network. Neural Netw. 2022;146:69-84. pmid:34839092
- View Article
- PubMed/NCBI
- Google Scholar
4. Xu J, Ren Y, Tang H, Yang Z, Pan L, Yang Y, et al. Self-supervised discriminative feature learning for deep clustering. IEEE Trans Knowl Data Eng. 2022.
- View Article
- Google Scholar
5. Mao Y, Zhang J, Qi H, Wang L. DNN-MVL: DNN-multi-view-learning-based recover block missing data in a dam safety monitoring system. Sensors (Basel). 2019;19(13):2895. pmid:31261982
- View Article
- PubMed/NCBI
- Google Scholar
6. Zhou H, Gong M, Wang S, Gao Y, Zhao Z. Smgcl: Semi-supervised multi-view graph contrastive learning. Knowledge-Based Syst. 2023;260:110120. https://doi.org/10.1016/j.knosys.2022.110120
7. Chao G, Sun S. Semi-supervised multi-view maximum entropy discrimination with expectation Laplacian regularization. Inf Fusion. 2019;45:296–306.
- View Article
- Google Scholar
8. Jiang B, Zhang C, Zhong Y, Liu Y, Zhang Y, Wu X, et al. Adaptive collaborative fusion for multi-view semi-supervised classification. Information Fusion. 2023;96:37–50.
- View Article
- Google Scholar
9. Xu J, Zheng H, Wang J, Li D, Fang X. Recognition of EEG signal motor imagery intention based on deep multi-view feature learning. Sensors. 2020;20(12):3496. https://doi.org/10.3390/s20123496 pmid:32575798
10. Alsulami N, Althobaiti H, Alafif T. MV-MFF: multi-view multi-feature fusion model for pneumonia classification. Diagnostics. 2024;14(14):1566. https://doi.org/10.3390/diagnostics14141566 pmid:39061703
11. Zhang B, Qiang Q, Wang F, Nie F. Fast multi-view semi-supervised learning with learned graph. IEEE Trans Knowl Data Eng. 2020;34(1):286–299.
- View Article
- Google Scholar
12. Li S, Li WT, Wang W. Co-gcn for multi-view semi-supervised learning. In: AAAI Conference on Artificial Intelligence; 2020.
- View Article
- Google Scholar
13. Wang Xl, Zhu Zf, Song Y, Fu Hj. GRNet: graph-based remodeling network for multi-view semi-supervised classification. Pattern Recognit Lett. 2021;151:95–102.
- View Article
- Google Scholar
14. Wang X, Wang Y, Ke G, Wang Y, Hong X. Knowledge distillation-driven semi-supervised multi-view classification. Inf Fusion. 2024;103:102098.
- View Article
- Google Scholar
15. Noroozi V, Bahaadini S, Zheng L, Xie S, Shao W, Philip SY. Semi-supervised deep representation learning for multi-view problems. In: 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA. 2018;56–64.
- View Article
- Google Scholar
16. Jia X, Jing XY, Zhu X, Chen S, Du B, Cai Z, et al. Semi-supervised multi-view deep discriminant representation learning. IEEE Trans Pattern Anal Mach Intell. 2020;43(7):2496–509. https://doi.ieeecomputersociety.org/10.1109/TPAMI.2020.2973634
- View Article
- Google Scholar
17. Yu J, Li J, Yu Z, Huang Q. Multimodal transformer with multi-view visual representation for image captioning. IEEE Trans Circuits Syst Video Technol. 2019;30(12):4467–4480.
- View Article
- Google Scholar
18. Cao X, Zhang C, Fu H, Liu S, Zhang H. Diversity-induced multi-view subspace clustering. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA. 2015;586–94.
- View Article
- Google Scholar
19. Cheng Y, Zhao X, Cai R, Li Z, Huang K, Rui Y, et al. Semi-supervised multimodal deep learning for RGB-D object recognition. In: International Joint Conference on Artificial Intelligence. 2016;3345–51.
- View Article
- Google Scholar
20. Liu Y, Liu Y, Duan Y. MVG-Net: LiDAR point cloud semantic segmentation network integrating multi-view images. Remote Sensing. 2024;16(15):2821. 16152821
- View Article
- PubMed/NCBI
- Google Scholar
21. Wu J, Yao Y, Zhang G, Li X, Peng B. Difficult airway assessment based on multi-view metric learning. Bioengineering. 2024;11(7):703. pmid:39061785
- View Article
- PubMed/NCBI
- Google Scholar
22. Liu C, Wen J, Liu Y, Huang C, Wu Z, Luo X, et al. Masked two-channel decoupling framework for incomplete multi-view weak multi-label learning. In: NIPS ’23: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023; 32387–400, Article No: 1406.
- View Article
- Google Scholar
23. Liu C, Xu G, Wen J, Liu Y, Huang C, Xu Y. Partial multi-view multi-label classification via semantic invariance learning and prototype modeling. In: Proceedings of the 41st International Conference on Machine Learning, PMLR 235:32253–67, 2024.
- View Article
- Google Scholar
24. Li J, Xiong C, Hoi SC. Comatch: Semi-supervised learning with contrastive graph regularization. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada. 2021;9455–64.
- View Article
- Google Scholar
25. Wang Y, Chen H, Heng Q, Hou W, Savvides M, Shinozaki T, et al. FreeMatch: self-adaptive thresholding for semi-supervised learning. arXiv. preprint. arXiv.2205.07246.
- View Article
- Google Scholar
26. Zhang B, Wang Y, Hou W, Wu H, Wang J, Okumura M, et al. Flexmatch: boosting semi-supervised learning with curriculum pseudo labeling. In: Advances in Neural Information Processing Systems. 2021;34:18408–18419.
- View Article
- Google Scholar
27. Sohn K, Berthelot D, Carlini N, Zhang Z, Zhang H, Raffel CA, et al. FixMatch: simplifying semi-supervised learning with consistency and confidence. In: Advances in Neural Information Processing Systems. Curran Associates, Inc.; 2020(33);596–608.
28. Ma F, Meng D, Dong X, Yang Y. Self-paced multi-view co-training. J Mach Learn Res. 2020;21;1–38.
- View Article
- Google Scholar
29. Liu LY, Huang P, Min F. Safe multi-view co-training for semi-supervised regression. In: 2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA). IEEE; 2022;1–10.
30. Zhuge W, Hou C, Peng S, Yi D. Joint consensus and diversity for multi-view semi-supervised classification. Mach Learn. 2020;109(3):445–65.
- View Article
- Google Scholar
31. Nie F, Li J, Li X. Parameter-free auto-weighted multiple graph learning: a framework for multiview clustering and semi-supervised classification. In: International Joint Conference on Artificial Intelligence; 2016.
- View Article
- Google Scholar
32. Nie F, Cai G, Li J, Li X. Auto-weighted multi-view learning for image clustering and semi-supervised classification. IEEE Trans Image Process. 2018;27(3):1501–11.
- View Article
- Google Scholar
33. Tao H, Hou C, Nie F, Zhu J, Yi D. Scalable multi-view semi-supervised classification via adaptive regression. IEEE Trans Image Process. 2017;26(9):4283–96.
- View Article
- Google Scholar
34. Huang A, Wang Z, Zheng Y, Zhao T, Lin CW. Embedding regularizer learning for multi-view semi-supervised classification. IEEE Trans Image Process. 2021;30:6997–7011.
- View Article
- Google Scholar
35. Huang H, Liang N, Yan W, Yang Z, Sun W. Partially shared semi-supervised deep matrix factorization with multi-view data. arXiv:201200993 [cs]. 2020;
- View Article
- Google Scholar
36. Kumar A, Daumé H. A co-training approach for multi-view spectral clustering. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11); 2011;393–400.
- View Article
- Google Scholar
37. Guo W, Wang Z, Du W. Robust semi-supervised multi-view graph learning with sharable and individual structure. Pattern Recognit. 2023;140:109565.
- View Article
- Google Scholar
38. Yu Y, Zhou G, Huang H, Xie S, Zhao Q. A semi-supervised label-driven auto-weighted strategy for multi-view data classification. Knowl-Based Syst. 2022;255:109694.
- View Article
- Google Scholar
39. Kang M, Lee K, Lee YH, Suh C. Autoencoder-based graph construction for semi-supervised learning. In: Vedaldi A, Bischof H, Brox T, Frahm JM (Editors). Computer Vision – ECCV 2020. Springer International Publishing; 2020;12369:500–17.
40. Zhang C, Liu Y, Fu H. Ae2-Nets: autoencoder in autoencoder networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2019;2577–85.
- View Article
- Google Scholar
41. Rideaux R, Storrs KR, Maiello G, Welchman AE. How multisensory neurons solve causal inference. Proc Natl Acad Sci U S A. 2021;118(32):e2106235118. pmid:34349023
- View Article
- PubMed/NCBI
- Google Scholar
42. Hou H, Zheng Q, Zhao Y, Pouget A, Gu Y. Neural correlates of optimal multisensory decision making under time-varying reliabilities with an invariant linear probabilistic population code. Neuron. 2019;104(5):1010–21.
- View Article
- Google Scholar
43. Han Z, Yang F, Huang J, Zhang C, Yao J. Multimodal dynamics: dynamical fusion for trustworthy multimodal classification. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE; 2022;20675–85.
- View Article
- Google Scholar
44. Corbière C, Thome N, Bar-Hen A, Cord M, Pérez P. Addressing failure prediction by learning model confidence. In: Advances in Neural Information Processing Systems. 2019;32.
- View Article
- Google Scholar
45. Cheng G, Han J, Lu X. Remote sensing image scene classification: Benchmark and state of the art. Proc IEEE. 2017;105(10):1865–1883.
- View Article
- Google Scholar
46. Hu Z, Nie F, Wang R, Li X. Multi-view spectral clustering via integrating nonnegative embedding and spectral embedding. Inf Fusion. 2020;55:251–259.
- View Article
- Google Scholar
47. Yu Y, Zhou G, Huang H, Xie S, Zhao Q. A semi-supervised label-driven auto-weighted strategy for multi-view data classification. Knowl-Based Syst. 2022;255:109694.
- View Article
- Google Scholar
48. Wu Z, Lin X, Lin Z, Chen Z, Bai Y, Wang S. Interpretable graph convolutional network for multi-view semi-supervised learning. IEEE Trans Multimedia. 2023;1–14.
- View Article
- Google Scholar
49. Van der Maaten L, Hinton G. Visualizing data using T-SNE. J Mach Learn Res. 2008;9(11).
- View Article
- Google Scholar

[ref1] 1. Wang X, Fu L, Zhang Y, Wang Y, Li Z. MMatch: semi-supervised discriminative representation learning for multi-view classification. IEEE Trans Circuits Syst Video Technol. 2022;1–1.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Wang X, Wang Y, Wang Y, Huang A, Liu J. Trusted semi-supervised multi-view classification with contrastive learning. IEEE Trans Multimedia. 2024;26:8268–78.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Tian Y, Sun S, Tang J. Multi-view teacher–student network. Neural Netw. 2022;146:69-84. pmid:34839092
View Article
PubMed/NCBI
Google Scholar

[8] View Article

[9] PubMed/NCBI

[10] Google Scholar

[ref4] 4. Xu J, Ren Y, Tang H, Yang Z, Pan L, Yang Y, et al. Self-supervised discriminative feature learning for deep clustering. IEEE Trans Knowl Data Eng. 2022.
View Article
Google Scholar

[12] View Article

[13] Google Scholar

[ref5] 5. Mao Y, Zhang J, Qi H, Wang L. DNN-MVL: DNN-multi-view-learning-based recover block missing data in a dam safety monitoring system. Sensors (Basel). 2019;19(13):2895. pmid:31261982
View Article
PubMed/NCBI
Google Scholar

[15] View Article

[16] PubMed/NCBI

[17] Google Scholar

[ref6] 6. Zhou H, Gong M, Wang S, Gao Y, Zhao Z. Smgcl: Semi-supervised multi-view graph contrastive learning. Knowledge-Based Syst. 2023;260:110120. https://doi.org/10.1016/j.knosys.2022.110120

[ref7] 7. Chao G, Sun S. Semi-supervised multi-view maximum entropy discrimination with expectation Laplacian regularization. Inf Fusion. 2019;45:296–306.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref8] 8. Jiang B, Zhang C, Zhong Y, Liu Y, Zhang Y, Wu X, et al. Adaptive collaborative fusion for multi-view semi-supervised classification. Information Fusion. 2023;96:37–50.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref9] 9. Xu J, Zheng H, Wang J, Li D, Fang X. Recognition of EEG signal motor imagery intention based on deep multi-view feature learning. Sensors. 2020;20(12):3496. https://doi.org/10.3390/s20123496 pmid:32575798

[ref10] 10. Alsulami N, Althobaiti H, Alafif T. MV-MFF: multi-view multi-feature fusion model for pneumonia classification. Diagnostics. 2024;14(14):1566. https://doi.org/10.3390/diagnostics14141566 pmid:39061703

[ref11] 11. Zhang B, Qiang Q, Wang F, Nie F. Fast multi-view semi-supervised learning with learned graph. IEEE Trans Knowl Data Eng. 2020;34(1):286–299.
View Article
Google Scholar

[28] View Article

[29] Google Scholar

[ref12] 12. Li S, Li WT, Wang W. Co-gcn for multi-view semi-supervised learning. In: AAAI Conference on Artificial Intelligence; 2020.
View Article
Google Scholar

[31] View Article

[32] Google Scholar

[ref13] 13. Wang Xl, Zhu Zf, Song Y, Fu Hj. GRNet: graph-based remodeling network for multi-view semi-supervised classification. Pattern Recognit Lett. 2021;151:95–102.
View Article
Google Scholar

[34] View Article

[35] Google Scholar

[ref14] 14. Wang X, Wang Y, Ke G, Wang Y, Hong X. Knowledge distillation-driven semi-supervised multi-view classification. Inf Fusion. 2024;103:102098.
View Article
Google Scholar

[37] View Article

[38] Google Scholar

[ref15] 15. Noroozi V, Bahaadini S, Zheng L, Xie S, Shao W, Philip SY. Semi-supervised deep representation learning for multi-view problems. In: 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA. 2018;56–64.
View Article
Google Scholar

[40] View Article

[41] Google Scholar

[ref16] 16. Jia X, Jing XY, Zhu X, Chen S, Du B, Cai Z, et al. Semi-supervised multi-view deep discriminant representation learning. IEEE Trans Pattern Anal Mach Intell. 2020;43(7):2496–509. https://doi.ieeecomputersociety.org/10.1109/TPAMI.2020.2973634
View Article
Google Scholar

[43] View Article

[44] Google Scholar

[ref17] 17. Yu J, Li J, Yu Z, Huang Q. Multimodal transformer with multi-view visual representation for image captioning. IEEE Trans Circuits Syst Video Technol. 2019;30(12):4467–4480.
View Article
Google Scholar

[46] View Article

[47] Google Scholar

[ref18] 18. Cao X, Zhang C, Fu H, Liu S, Zhang H. Diversity-induced multi-view subspace clustering. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA. 2015;586–94.
View Article
Google Scholar

[49] View Article

[50] Google Scholar

[ref19] 19. Cheng Y, Zhao X, Cai R, Li Z, Huang K, Rui Y, et al. Semi-supervised multimodal deep learning for RGB-D object recognition. In: International Joint Conference on Artificial Intelligence. 2016;3345–51.
View Article
Google Scholar

[52] View Article

[53] Google Scholar

[ref20] 20. Liu Y, Liu Y, Duan Y. MVG-Net: LiDAR point cloud semantic segmentation network integrating multi-view images. Remote Sensing. 2024;16(15):2821. 16152821
View Article
PubMed/NCBI
Google Scholar

[55] View Article

[56] PubMed/NCBI

[57] Google Scholar

[ref21] 21. Wu J, Yao Y, Zhang G, Li X, Peng B. Difficult airway assessment based on multi-view metric learning. Bioengineering. 2024;11(7):703. pmid:39061785
View Article
PubMed/NCBI
Google Scholar

[59] View Article

[60] PubMed/NCBI

[61] Google Scholar

[ref22] 22. Liu C, Wen J, Liu Y, Huang C, Wu Z, Luo X, et al. Masked two-channel decoupling framework for incomplete multi-view weak multi-label learning. In: NIPS ’23: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023; 32387–400, Article No: 1406.
View Article
Google Scholar

[63] View Article

[64] Google Scholar

[ref23] 23. Liu C, Xu G, Wen J, Liu Y, Huang C, Xu Y. Partial multi-view multi-label classification via semantic invariance learning and prototype modeling. In: Proceedings of the 41st International Conference on Machine Learning, PMLR 235:32253–67, 2024.
View Article
Google Scholar

[66] View Article

[67] Google Scholar

[ref24] 24. Li J, Xiong C, Hoi SC. Comatch: Semi-supervised learning with contrastive graph regularization. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada. 2021;9455–64.
View Article
Google Scholar

[69] View Article

[70] Google Scholar

[ref25] 25. Wang Y, Chen H, Heng Q, Hou W, Savvides M, Shinozaki T, et al. FreeMatch: self-adaptive thresholding for semi-supervised learning. arXiv. preprint. arXiv.2205.07246.
View Article
Google Scholar

[72] View Article

[73] Google Scholar

[ref26] 26. Zhang B, Wang Y, Hou W, Wu H, Wang J, Okumura M, et al. Flexmatch: boosting semi-supervised learning with curriculum pseudo labeling. In: Advances in Neural Information Processing Systems. 2021;34:18408–18419.
View Article
Google Scholar

[75] View Article

[76] Google Scholar

[ref27] 27. Sohn K, Berthelot D, Carlini N, Zhang Z, Zhang H, Raffel CA, et al. FixMatch: simplifying semi-supervised learning with consistency and confidence. In: Advances in Neural Information Processing Systems. Curran Associates, Inc.; 2020(33);596–608.

[ref28] 28. Ma F, Meng D, Dong X, Yang Y. Self-paced multi-view co-training. J Mach Learn Res. 2020;21;1–38.
View Article
Google Scholar

[79] View Article

[80] Google Scholar

[ref29] 29. Liu LY, Huang P, Min F. Safe multi-view co-training for semi-supervised regression. In: 2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA). IEEE; 2022;1–10.

[ref30] 30. Zhuge W, Hou C, Peng S, Yi D. Joint consensus and diversity for multi-view semi-supervised classification. Mach Learn. 2020;109(3):445–65.
View Article
Google Scholar

[83] View Article

[84] Google Scholar

[ref31] 31. Nie F, Li J, Li X. Parameter-free auto-weighted multiple graph learning: a framework for multiview clustering and semi-supervised classification. In: International Joint Conference on Artificial Intelligence; 2016.
View Article
Google Scholar

[86] View Article

[87] Google Scholar

[ref32] 32. Nie F, Cai G, Li J, Li X. Auto-weighted multi-view learning for image clustering and semi-supervised classification. IEEE Trans Image Process. 2018;27(3):1501–11.
View Article
Google Scholar

[89] View Article

[90] Google Scholar

[ref33] 33. Tao H, Hou C, Nie F, Zhu J, Yi D. Scalable multi-view semi-supervised classification via adaptive regression. IEEE Trans Image Process. 2017;26(9):4283–96.
View Article
Google Scholar

[92] View Article

[93] Google Scholar

[ref34] 34. Huang A, Wang Z, Zheng Y, Zhao T, Lin CW. Embedding regularizer learning for multi-view semi-supervised classification. IEEE Trans Image Process. 2021;30:6997–7011.
View Article
Google Scholar

[95] View Article

[96] Google Scholar

[ref35] 35. Huang H, Liang N, Yan W, Yang Z, Sun W. Partially shared semi-supervised deep matrix factorization with multi-view data. arXiv:201200993 [cs]. 2020;
View Article
Google Scholar

[98] View Article

[99] Google Scholar

[ref36] 36. Kumar A, Daumé H. A co-training approach for multi-view spectral clustering. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11); 2011;393–400.
View Article
Google Scholar

[101] View Article

[102] Google Scholar

[ref37] 37. Guo W, Wang Z, Du W. Robust semi-supervised multi-view graph learning with sharable and individual structure. Pattern Recognit. 2023;140:109565.
View Article
Google Scholar

[104] View Article

[105] Google Scholar

[ref38] 38. Yu Y, Zhou G, Huang H, Xie S, Zhao Q. A semi-supervised label-driven auto-weighted strategy for multi-view data classification. Knowl-Based Syst. 2022;255:109694.
View Article
Google Scholar

[107] View Article

[108] Google Scholar

[ref39] 39. Kang M, Lee K, Lee YH, Suh C. Autoencoder-based graph construction for semi-supervised learning. In: Vedaldi A, Bischof H, Brox T, Frahm JM (Editors). Computer Vision – ECCV 2020. Springer International Publishing; 2020;12369:500–17.

[ref40] 40. Zhang C, Liu Y, Fu H. Ae2-Nets: autoencoder in autoencoder networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2019;2577–85.
View Article
Google Scholar

[111] View Article

[112] Google Scholar

[ref41] 41. Rideaux R, Storrs KR, Maiello G, Welchman AE. How multisensory neurons solve causal inference. Proc Natl Acad Sci U S A. 2021;118(32):e2106235118. pmid:34349023
View Article
PubMed/NCBI
Google Scholar

[114] View Article

[115] PubMed/NCBI

[116] Google Scholar

[ref42] 42. Hou H, Zheng Q, Zhao Y, Pouget A, Gu Y. Neural correlates of optimal multisensory decision making under time-varying reliabilities with an invariant linear probabilistic population code. Neuron. 2019;104(5):1010–21.
View Article
Google Scholar

[118] View Article

[119] Google Scholar

[ref43] 43. Han Z, Yang F, Huang J, Zhang C, Yao J. Multimodal dynamics: dynamical fusion for trustworthy multimodal classification. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE; 2022;20675–85.
View Article
Google Scholar

[121] View Article

[122] Google Scholar

[ref44] 44. Corbière C, Thome N, Bar-Hen A, Cord M, Pérez P. Addressing failure prediction by learning model confidence. In: Advances in Neural Information Processing Systems. 2019;32.
View Article
Google Scholar

[124] View Article

[125] Google Scholar

[ref45] 45. Cheng G, Han J, Lu X. Remote sensing image scene classification: Benchmark and state of the art. Proc IEEE. 2017;105(10):1865–1883.
View Article
Google Scholar

[127] View Article

[128] Google Scholar

[ref46] 46. Hu Z, Nie F, Wang R, Li X. Multi-view spectral clustering via integrating nonnegative embedding and spectral embedding. Inf Fusion. 2020;55:251–259.
View Article
Google Scholar

[130] View Article

[131] Google Scholar

[ref47] 47. Yu Y, Zhou G, Huang H, Xie S, Zhao Q. A semi-supervised label-driven auto-weighted strategy for multi-view data classification. Knowl-Based Syst. 2022;255:109694.
View Article
Google Scholar

[133] View Article

[134] Google Scholar

[ref48] 48. Wu Z, Lin X, Lin Z, Chen Z, Bai Y, Wang S. Interpretable graph convolutional network for multi-view semi-supervised learning. IEEE Trans Multimedia. 2023;1–14.
View Article
Google Scholar

[136] View Article

[137] Google Scholar

[ref49] 49. Van der Maaten L, Hinton G. Visualizing data using T-SNE. J Mach Learn Res. 2008;9(11).
View Article
Google Scholar

[139] View Article

[140] Google Scholar

Figures

Abstract

1 Introduction

2 Related work

3 Our method

3.1 Prelimilary

3.2 Multi-view data reconstruction

3.3 Dynamic multi-view fusion

3.4 Objective function

4 Experimental setup

4.1 Datasets

4.2 Comparison methods

4.2.1 Experimental details.

4.3 Experimental results and analysis

4.3.1 Classification results and display.

4.3.2 Visualization analysis.

4.3.3 Ablation study.

Discussion

Conclusion

Acknowledgments

References