CAT: Class-aware adaptive-thresholding for robust semi-supervised domain generalization

Sumaiya Zoha; Jeong-Gun Lee; Young-Woong Ko

doi:10.1371/journal.pone.0329799

Abstract

Domain Generalization (DG) seeks to transfer knowledge from multiple source domains to unseen target domains, even in the presence of domain shifts. Achieving effective generalization typically requires a large and diverse set of labeled source data to learn robust representations that can generalize to new, unseen domains. However, obtaining such high-quality labeled data is often costly and labor-intensive, limiting the practical applicability of DG. To address this, we investigate a more practical and challenging problem: semi-supervised domain generalization (SSDG) under a label-efficient paradigm. In this paper, we propose a novel method, CAT, which leverages semi-supervised learning with limited labeled data to achieve competitive generalization performance under domain shifts. Our method addresses key limitations of previous approaches, such as reliance on fixed thresholds and sensitivity to noisy pseudo-labels. CAT combines adaptive thresholding with noisy label refinement techniques, creating a straightforward yet highly effective solution for SSDG tasks. Specifically, our approach uses flexible thresholding to generate high-quality pseudo-labels with higher class diversity while refining noisy pseudo-labels to improve their reliability. Extensive experiments on multiple benchmark datasets demonstrate the superior performance of our method, with improvements of 3.45% on PACS, 9.47% on OfficeHome, and 10.90% on miniDomainNet datasets, highlighting its effectiveness in achieving robust generalization under domain shifts.

Citation: Zoha S, Lee J-G, Ko Y-W (2025) CAT: Class-aware adaptive-thresholding for robust semi-supervised domain generalization. PLoS One 20(9): e0329799. https://doi.org/10.1371/journal.pone.0329799

Editor: Fatih Uysal, Kafkas University: Kafkas Universitesi, TÜRKIYE

Received: April 9, 2025; Accepted: July 22, 2025; Published: September 4, 2025

Copyright: © 2025 Zoha et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All files are available from this database https://github.com/KaiyangZhou/ssdg-benchmark.

Funding: This study was supported by Hallym University Research Fund, 2024 under Grant Number HRF-202408-001. The grant was awarded to Young-Woong Ko.

Competing interests: The authors have declared that no competing interests exist.

1 Introduction

Most machine learning models assume that the train and test data are drawn from the same distribution, but in practice, both distributions can frequently change due to the involving test distribution. For this limitation, these models cannot be generalized in real-world applications, such as autonomous driving systems or medical applications [1–3]. On the other side, deep neural networks have shown remarkable success in various classification tasks under fully annotated training conditions. Most deep learning (DL) models require a large amount of labeled data to achieve comparable results. However, in real-world applications, collecting labeled data is challenging due to its substantial cost and the need for human annotation [4–7]. Recently, semi-supervised learning (SSL) [4,6,8] techniques have gained significant attention for their ability to effectively utilize unlabeled data alongside a small amount of labeled data. The main challenge in SSL lies in learning effective representations of unlabeled data in relation to labeled examples to enhance generalization performance. To address this, techniques such as pseudo-labeling [9–11] and consistency regularization [12–14] have proven to be effective. However, these methods are designed primarily for single-source classification tasks, making it difficult for them to capture multiple cross-domain relationships, a critical requirement for domain generalization (DG). Please refer to Fig 1 for illustration of DG, SSL and semi-supervised domain generalization (SSDG).

Download:

Fig 1. In typical domain generalization setting, multiple labeled domains are used for training, where as in semi-supervised learning, a few annotated data are used with a large amount of annotated data.

But in semi-supervised domain generalization (SSDG), a few annotated data used with a large amount of labeled data from multiple domains.

https://doi.org/10.1371/journal.pone.0329799.g001

Domain shift [15–17] presents a significant challenge in deploying deep learning models, especially in critical applications such as medical imaging and self-driving systems, where domain shifts can lead to severe risks. To address this, domain generalization (DG) methods have been developed [18–21]. Most DG methods rely on supervised learning, where a model is trained on multiple labeled source domains. However, in real-world scenarios, obtaining sufficient labeled data for these domains is often impractical and burdensome.

On the other hand, unlabeled samples from source domains are more feasible and abundant. The challenge lies in their variability and the presence of unknown classes. Most SSL methods leverage these abundant unlabeled samples with the guidance of labeled samples to generate pseudo-labels. Producing accurate pseudo-labels is essential for effectively utilizing unlabeled data in model training. Nevertheless, existing DG methods heavily depend on fully annotated source samples to perform well, limiting their applicability in real-world scenarios. In this paper, we explore the potential of the SSL paradigm in DG settings, referred to as semi-supervised domain generalization (SSDG). Fig 1 illustrates the differences between SSL, DG and SSDG methods.

As described above, pseudo-labeling is effective for utilizing unlabeled samples, but many methods rely on fixed thresholding. For example, FixMatch [13] uses a fixed threshold for all classes, which often discards too many unlabeled samples with correct pseudo-labels. In SSDG settings, StyleMatch [22] extends the same fixed-threshold strategy as FixMatch [13], but its performance is similarly limited by the loss of valuable unlabeled samples. Adaptive and dynamic class-dependent thresholding offers a reliable solution to this issue [23–25]. However, these methods are designed for single-domain SSL settings, making multi-domain training—a strict requirement for DG—challenging and often infeasible for achieving successful SSDG.

Our work is closely related to recent advances in SSL and DG. FreeMatch [23] addresses this limitation by introducing a self-adaptive thresholding mechanism based on the model’s learning status, while our approach is inspired from FreeMatch, we extend its thresholding mechanism to the multi-domain setting by introducing class-domain adaptive thresholds. This allows our method to better capture both class-specific and domain-specific variations in the data, which is essential for robust pseudo-labeling under domain shift. Moreover, AdaMatch [26] presents a unified approach to SSL, and unsupervised domain adaptation (UDA), by aligning distributions of weak and strong augmentations and calibrating confidence thresholds using batch statistics. Unlike AdaMatch, which assumes access to unlabeled target data for adaptation, our method tackles the more challenging domain generalization problem where target domain data is completely unseen during training. Finally, PCL [27] finds that naive application of supervised contrastive learning can degrade generalization due to misaligned sample pairs across domains. To mitigate this, PCL proposes proxy-based contrastive learning to align representations via learned prototypes. We instead retain sample-to-sample contrastive learning but improve it by first identifying noisy pseudo-labeled samples. This enables robust representation learning without explicit proxy usage. Our method also complements this with unsupervised contrastive learning on uncertain samples, allowing us to refine noisy pseudo-labels while preserving domain diversity and structure.

To address these limitations, we propose CAT, an adaptive thresholding method specifically designed for SSDG settings. CAT overcomes the drawbacks of fixed-threshold approaches by employing adaptive class-dependent thresholds tailored for SSDG tasks. We utilize both global and local thresholds, iteratively increasing the thresholds based on the training time steps. This strategy allows the model to capture more correct pseudo-labels compared to strictly fixed thresholds. Local thresholding is employed to ensure variability across class labels and to improve the confidence dynamics for producing pseudo-labels. In parallel, a noisy label refinement module is integrated to further refine pseudo-labels, ensuring higher quality. Additionally, we leverage supervised contrastive learning with the refined pseudo-labels to achieve domain-invariant representations. Experimental results on several benchmarks demonstrate the superiority of our method. Our contributions are threefold:

Our work is closely related to recent advances in SSL and DG. FreeMatch [23] addresses this limitation by introducing a self-adaptive thresholding mechanism based on the model’s learning status, while our approach is inspired from FreeMatch, we extend its thresholding mechanism to the multi-domain setting by introducing class-domain adaptive thresholds. This allows our method to better capture both class-specific and domain-specific variations in the data, which is essential for robust pseudo-labeling under domain shift. Moreover, AdaMatch [26] presents a unified approach to SSL, and unsupervised domain adaptation (UDA), by aligning distributions of weak and strong augmentations and calibrating confidence thresholds using batch statistics. Unlike AdaMatch, which assumes access to unlabeled target data for adaptation, our method tackles the more challenging domain generalization problem where target domain data is completely unseen during training. Finally, PCL [27] finds that naive application of supervised contrastive learning can degrade generalization due to misaligned sample pairs across domains. To mitigate this, PCL proposes proxy-based contrastive learning to align representations via learned prototypes. We instead retain sample-to-sample contrastive learning but improve it by first identifying noisy pseudo-labeled samples. This enables robust representation learning without explicit proxy usage. Our method also complements this with unsupervised contrastive learning on uncertain samples, allowing us to refine noisy pseudo-labels while preserving domain diversity and structure.

Motivated by the challenges of generating high-quality pseudo-labels for SSDG, we propose a method that produces robust pseudo-labels, effectively mitigating the impact of noise.
We introduce CAT, a simple yet effective approach that integrates adaptive thresholding with a noisy label refinement module to achieve superior performance in SSDG settings.
Extensive experiments on multiple benchmarks validate the effectiveness of our method. CAT not only outperforms state-of-the-art SSDG methods but also surpasses standalone DG and SSL approaches.

The organization of this paper is as follows: Sect 2 presents related work that comprehensively reviewed relevant works, Sect 3 details our proposed method, Sect 4 presents the experimental results, demonstrating the effectiveness of our approach comparing with other state-of-the-art methods, and Sect 5 concludes the paper by summarizing the findings and providing future directions.

2 Related works

Domain generalization. Domain generalization (DG) aims train with multiple source domains and transfer to unseen target domains. Most DG settings consider source and target domains to be from different distributions. The main goal is to perform well under this distribution shift, also called domain shift. DG can be categorized into multiple methods such as domain alignment, meta-learning, adversarial learning, data augmentation, ensemble learning, self-supervised learning, and feature regularization [18]. Domain alignment methods are based on minimizing moments [28], KL-divergence [29], and maximum mean discrepancy [30] to learn domain-invariant representations. In meta-learning-based DG, training data is divided into meta-train and meta-test sets to improve generalization on the meta-test set. Most existing methods are based on episode construction, where source domains are divided into meta-train and meta-test domains to stimulate domain shift [31,32]. Other prominent approaches, such as adversarial learning where the learned features are enforced to be agnostic to domain information [30,33]. In augmentation, most of the works are related to feature augmentation [20,34,35] or model-based augmentation [36]. Ensemble learning techniques learn multiple models with different initializations and utilize their ensemble for prediction, examples are domain-specific neural networks [37,38] and batch normalization [39,40]. Self-supervised learning explores pretext tasks that allow a model to learn invariant features [41,42]. Lastly, regularization methods are based on feature regularization [43] and model regularization [44].

Semi-supervised learning. Semi-supervised learning (SSL) refers to learning from limited data and utilizing abundant unlabeled data. SSL aims to predict data accurately assuming that labeled and unlabeled data are from an identical distribution [8,9,45]. Most SSL techniques are based on pseudo-labels [9,10], mean-teacher [46–48], and consistency regularization [12–14]. Except for consistency regularization, entropy-based regularization is also widely used in SSL, where entropy minimization encourages the model to make confident predictions based on all samples [49]. On the other hand, thresholding-based methods FixMatch [13], FreeMatch [23], and UDA [50] select samples based on pre-defined thresholds during training, so multiple works proposed adaptive and dynamic thresholding to alleviate this limitation. DASH [25], AdaMatch [26] uses a pre-defined threshold to adjust based on the loss from labeled data and multiply average confidence to noisy pseudo labels. Self-training [51–53] methods are also effective in SSL settings, it is also known as decision-directed learning where the main goal is to determine the decision boundary on low-density regions [54].

Semi-supervised domain generalization (SSDG). Semi-supervised domain generalization (SSDG) involves SSL and DG which is a more difficult setting due to utilizing a large amount of unlabeled data to achieve competitive DG results. One most recent works is StyleMatch [22], which utilizes a stochastic classifier to extend FixMatch [13] with multi-view consistency to achieve SSDG. Another line of work is based on utilizing known and unknown classes with class-adaptive method [55]. MultiMatch [56] extends FixMatch [13] but in a multi-task setting by producing high-quality pseudo-labels for SSDG. Although these methods achieved comparable results in SSDG tasks, but not sufficient for real-world practicability. In Table 1, we provided a comparison between our work and other related works such as StyleMatch [22] and MultiMatch [56].

Download:

Table 1. Comparison with related semi-supervised domain generalization methods.

https://doi.org/10.1371/journal.pone.0329799.t001

3 Method

Our method CAT, is a semi-supervised domain generalization (SSDG) approach that considers dynamically adjusting class-dependent thresholds using global and local thresholding, iteratively increasing them over training steps. This dynamic thresholding approach helps to capture accurate pseudo-labels with fewer errors; on the other hand, local thresholding enables variability across class-labels, improving confidence and class-wise pseudo-label quality. To further refine pseudo-labels, we integrate a noisy label refinement module, which filters out low-quality labels and ensures higher reliability. Additionally, a supervised contrastive learning approach with refined pseudo-labels is employed to learn domain-invariant representations. An overview of our method is given in Fig 2.

Download:

Fig 2. The overall pipeline of CAT.

Given a few labeled and unlabeled samples from multiple source domains, CAT first performs weak and strong augmentations to get initial pseudo-labels. Then, CAT employs global and local thresholds that can dynamically adapt thresholding based on class dependency. Finally, noisy pseudo-label refining module is employed to get noise-free pseudo-labels.

https://doi.org/10.1371/journal.pone.0329799.g002

3.1 Notation and preliminaries

Semi-supervised learning. In SSL settings, we are given a set of N labeled samples from an unknown distribution, which includes sample and label pairs , and M unlabeled samples without defined labels . There are k classes, where N_k and M_k are the numbers of labeled and unlabeled samples in the k-th class, respectively. Without loss of generality, . The training loss calculated in an SSL algorithm usually contains a supervised loss and an unsupervised loss . Typically, is calculated based on samples with a cross-entropy loss. The loss function is defined as:

(1)

Expanding the entropy term:

Here, is the probabilities produced by the model function f, which is parameterized by θ for the input x, and is the cross-entropy loss. The unsupervised loss is calculated based on different settings of SSL algorithms. One key example is from FixMatch [13], where the unsupervised loss is guided by generating pseudo-labels, and eventually using the same supervised loss objective via cross-entropy loss.

Domain generalization. In typical DG settings, we have k source domains, each containing N samples. The inputs x and their corresponding y labels are drawn from a joint distribution. The k source domains are similar but distinct, denoted as . The main goal of DG is to learn a model function f that can leverage these k sources to learn a representation that performs well on unlabeled and unseen target samples , by reducing the domain shift between the source and target domains.

(2)

Here, represents the expectation and is the loss function.

Semi-supervised domain generalization. Similar to the conventional DG setting, we have multiple diverse domains from k source domains, where each source domain consists of pairs of images and corresponding labels [22,57]. However, in the SSDG setting, each source domain contains only a small number of labeled samples , while the remaining labels are unlabeled, denoted as n_U, with in each source domain. This setting combines aspects of both SSL and DG. The ultimate goal is to learn a domain-generalizable model using both labeled and unlabeled source data , such that the model performs well on unseen target data. A summary of notations is given in Table 2.

Download:

Table 2. Summary of notations.

https://doi.org/10.1371/journal.pone.0329799.t002

3.2 Class-domain aware thresholding

Due to its simplicity and effectiveness, StyleMatch [22] leverages FixMatch [13] to generate pseudo-labels using a classifier with a fixed threshold. In this work, we revisit FixMatch to understand better the process of selecting unlabeled candidate samples for pseudo-label generation, particularly the fixed confidence threshold. We argue that relying on a fixed threshold may exclude a significant number of unlabeled samples that could receive accurate pseudo-labels, thereby limiting the practical applicability of FixMatch in data-efficient scenarios. Another challenge is that these thresholds are not class-independent, which makes FixMatch less suited for capturing class-variant information, especially in multi-domain settings. In FixMatch [13], supervised loss and unsupervised loss are employed for labeled and unlabeled data, respectively, where corresponds to the standard cross-entropy loss:

(3)

Here, N denotes the number of samples, and represents the loss function, where the true distribution and the predicted distribution p_k are provided. Motivated by the limitations of FixMatch [13] in generating pseudo-labels, we focus on adaptive thresholding, which is less restrictive and more flexible in selecting class-wise samples. Recently, adaptive and dynamic thresholding methods have demonstrated effectiveness in SSL settings [23–25], primarily due to their ability to handle class-dependent samples flexibly. However, in the DG it is crucial not only to adaptively select class-dependent samples but also to preserve domain-specific information. This dual requirement is essential for leveraging unlabeled data effectively while maintaining domain and class consistency. Unlike prior methods such as [23,24], which adaptively set class-dependent thresholds without considering domain-specific information, we propose a method that incorporates both class and domain dependencies in pseudo-label selection. In FreeMatch [23], global and local thresholds are set to be both dataset- and class-specific. Inspired by this approach, we extend the concept to simultaneously define domain- and class-dependent thresholds. By incorporating these dual thresholds, our method dynamically selects pseudo-labels based on both class and domain information, thereby maximizing the utility of unlabeled samples in the DG setting.

Data augmentation. We use UDA [50] strategy for data augmentation to get weak and strong augmentation. Inspired by FixMatch [13] and FreeMatch [23], we use RandAugment [58] for strong augmentation. Data augmentation is used for retaining pseudo-labels on the unlabeled data followed by an unsupervised loss [23]:

(4)

Here, is the indicator function for confidence-based thresholding [13].

Class-specific global and local thresholding. Following [23], we utilize a global threshold to iteratively increase the threshold to engage with many samples with a low threshold, then it stably discards incorrect pseudo labels. Based on the t–th time step, the model’s average confidence on the unlabeled data to compute the global threshold . is initialized as 1/C where C is the number of class in each source domain . Then is adjusted in each time step t [23] based on the exponential moving average (EMA):

(5)

Here, is the momentum decay of EMA. Now, to adjust the global threshold in a class-specific manner. The expectation of the model’s prediction on each class c based on the source domain to estimate class-specific learning.

(6)

Here, is the list of all existing classes. Then we integrate Max Normalization to obtain a self-adaptive threshold based on each class .

(7)

So, the final unsupervised loss can be formulated as [23]:

(8)

3.3 Refining noisy pseudo labels

Contrastive learning (CL) aims to learn universal prior information that can be applied to downstream tasks. In this approach, we use CL to extract universal prior knowledge from positive and negative samples and leverage it to enhance generalization performance in downstream tasks [59]. In CL, a common strategy is to pull positive pairs (which are semantically similar) closer together and push negative pairs (which are semantically dissimilar) farther apart. Conventional CL methods are related to leverage unlabeled samples, with unsupervised fashion. But based on the pseudo labeled based on self-adaptive thresholding for the unlabeled samples, we construct positive and negative samples based on supervised CL [59]. Where we consider labeled information is available. But obtained pseudo-labels can be noisy that can lead to poor generalization performance. This enhances multi-domain learning and allows understanding of the class-specific samples to sample relationships from diverse domains from the source dataset. To enable multi-domain learning, we utilize supervised contrastive learning assuming some of the pseudo labels can be noisy that can affect the generalization performance, which can align these hard samples, inspired by [27]. We use unsupervised-CL for warm up training where low-dimentional representation and pseudo-labels are given. Our goal is to find the similarity of the given samples by using cosine distance.

(9)

Where, are the low-dimensional representations. For each sample given by its pseudo labels , we aggregate its original label based on the top-K neighbors based on the similarity of their representations. In this way, we can improve the detection of mislabeled pseudo-labeled samples. To achieve more confident labels, we use the α fractile based on per class, which gives the agreements between the corrected labels based on the neighbors and similarity and original pseudo-labels across all classes [60,61]. After identifying the less noisy samples, we construct a set for representation learning. This set also help us to indentify whether given two instances belong from a same class or not.

Supervised contrastive learning. We use supervised CL loss that can handle the presence of labels, where supervised loss considers all samples from the same class as positive, and rest of the remaining samples as negative. This loss can enhance the representation learning from the given less noisy samples. The supervised CL objective can be written as:

(10)

Here, is a temperature parameter. Despite using supervised loss in the less noisy samples, we perform unsupervised CL on rest of the unselected samples, following [61].

Final training objective. Lastly, combining all losses, we can obtain the final loss such as:

(11)

Where, represents the loss weight for .

Algorithm 1. CAT.

1: Input: Labeled dataset , unlabeled dataset , model f with parameters θ, batch size B, confidence threshold

2: Initialize: Model parameters θ, EMA decay λ, class-specific confidence scores

3: for each training step t do

4: Sample labeled batch of size B

5: Sample unlabeled batch of size

{Compute supervised loss}

6:

{Generate pseudo-labels for unlabeled data}

7: for each x_j in unlabeled batch do

8:

9:

10: if then retain

11: end for

{Compute unsupervised loss}

12:

{Update global threshold using EMA}

13:

{Class-specific thresholding}

14:

15:

{Update model parameters}

16:

17: end for

18: Return: Trained model f with parameters θ

4 Experimental settings

4.1 Datasets

We use three publicly available datasets such as PCAS [21], OfficeHome [62], VLCS [63] and miniDomainNet [20] to evaluate our model with other baselines for semi-supervised domain generalization tasks. PACS contains 7 classes of images from distinct 4 domains (Photo - P, Art Painting - A, Cartoon - C and Sketch - S), OfficeHome contains images from 4 different domains (Artistic - A, Clip art - C, Product - P and Real-world - R). It is a relatively large dataset with 65 distinct classes related to daily life objects found in offices and homes. We also use miniDomainNet. It is a subset dataset of DomainNet with 4 different domains (Clipart - C, Painting - P, Real - R and Sketch - S), it covers almost 126 distinct classes. We report the average accuracy over the last five epochs as the final results. A summary of information about the datasets is given in Table 3.

Download:

Table 3. Summary of PACS, OfficeHome, VLCS, and miniDomainNet datasets, including the number of samples, domains, and domain names.

https://doi.org/10.1371/journal.pone.0329799.t003

4.2 Implementation details

We followed the protocol described in [21,22], these are common practice protocols in domain generalization setting. We utilize the leave-one-domain-out method, in which the model is trained with n–1 number of domains from the training dataset and evaluated on the remaining domain [21]. Pre-trained ResNet-18 and ResNet-50 variants [64] are used as the backbone of the model. Following [22], we randomly sample 16 images from the source domain for the mini-batch reconstruction with labeled and unlabeled data. With guidance from the labeled data, we generate the pseudo and proxy labels using the unlabeled data. The learning rate is set to 0.003, we examined multiple learning rates to find the best one. We use SGD optimizer with a standard momentum 0.9, and epochs are 40, 20, 20 for PACS, OfficeHome and miniDomainNet respectively with early stopping based on the validation accuracy (patience every 5 epochs). All models are trained using an RTX 3090 GPU. Our implementation is based on Dassl.pytorch [20] toolbox. For data preprocessing, we follow the standard practice used in [21,22], input images are resized to 224×224, and augmented with random cropping and horizontal flipping. We use the SGD optimizer with a standard momentum 0.9, and epochs are 40, 20, 20 for PACS, OfficeHome, and miniDomainNet, respectively, with early stopping based on the validation accuracy (patience every 5 epochs). We set = 1 for all experimental cases. We use the official train-validate split of each dataset for validation, and labeled samples are randomly sampled from the training split. All samples of the target domain are used as test data.

5 Experimental results

5.1 Comparison with state-of-the-art methods

In this experiment, we compare our method with multiple state-of-the-art methods on standard DG datasets to verify the effectiveness of our method. We divide the comparison with four different paradigms (i.e. fully-labeled, domain generalization methods, semi-supervised methods, and semi-supervised domain generalization method). In the fully labeled setting, all source labels are available during training under the conventional DG settings. In the DG setting, we compare our method with vanilla training, CrossGrad [65], DDAIG [66], RSC [67] and EISNet [68] where EISNet also utilized unlabeled samples during training. In the SSL setting, we compare our method with traditional methods like MeanTeacher [46], EntMin [49], FixMatch [13], and FreeMatch [23]. In the SSDG setting, we compare our method with StyleMatch [22], and MultiMatch [56] as these two approaches have similar evaluation settings, and official codes are provided. We borrow the results from StyleMatch and MultiMatch in Tables 2 and 3.

Main results. Here, full-labels refers to training ERM with all labels in the source domains. Table 4 presents the domain generalization performance of various models in the low-data regime, evaluated on four benchmark datasets: PACS, OfficeHome, VLCS, and miniDomainNet. The baseline "Full-Labels," representing a fully supervised model trained with labeled data, achieves an average accuracy of 79.50% across all datasets in both labeling settings. This serves as a reference point to assess the effectiveness of SSDG methods. Among the SSDG methods, StyleMatch demonstrates reasonable performance, achieving average accuracies of 80.41% and 80.32% for the 10-label and 5-label settings, respectively. However, its reliance on fixed thresholding limits its ability to fully utilize unlabeled data. Similarly, MultiMatch performs slightly worse, with average accuracies of 79.10% and 78.18% for the respective labeling scenarios. In contrast, the proposed method, CAT, achieves superior results across all datasets and labeling conditions. For the 10-label setting, CAT achieves an average accuracy of 82.00%, and for the 5-label setting, it achieves 82.71%, outperforming StyleMatch and MultiMatch by notable margins. CAT’s adaptive thresholding strategy, which incorporates both class-specific and domain-specific information, enables effective utilization of unlabeled data, contributing to its improved performance. When evaluated on individual datasets, CAT consistently achieves the highest accuracy. For instance, on PACS, it achieves 82.95% and 82.71% for the 10-label and 5-label settings, respectively. Similarly, on OfficeHome, CAT records 75.23% and 75.50%. On VLCS, CAT achieves outstanding results of 93.43% and 93.00%, and on miniDomainNet, it obtains 80.10% and 76.19% under the respective label conditions. As shown in the last row of Table 4, all p-values are below 0.0005, indicating that CAT’s improvements are statistically significant at the 0.05 level. This confirms that the observed gains are not due to random chance and highlight the robustness of CAT across diverse domains and data scarcity levels. In summary, the results in Table 4 demonstrate that CAT effectively addresses the challenges of semi-supervised domain generalization in low-data regimes. By leveraging adaptive thresholding, CAT consistently outperforms existing methods across diverse datasets and labeling conditions, highlighting its robustness and practicality for real-world applications.

Download:

Table 4. Domain generalization results (%) in the low-data regime with a comparison of various models in SSDG settings, evaluated on all datasets. Results are reported as mean ± standard deviation over 5 random seeds. Here, u denotes utilization of unlabeled data. Paired t-tests were conducted between CAT and other baselines, with p-values shown in the last row.

https://doi.org/10.1371/journal.pone.0329799.t004

Results on PACS. Table 5 provides a detailed comparison of model performance on the PACS dataset in a low-data regime. The Full-Labels model, trained with all labeled data, serves as the upper bound, achieving an average accuracy of 79.50% across both settings. Among the DG methods, which generalize across domains without leveraging unlabeled data, models like Vanilla, CrossGrad, and RSC perform moderately, with RSC achieving an average accuracy of 63.96% (10 labels) and 57.31% (5 labels). EISNet, which does use unlabeled data, shows better performance, reaching 67.18% and 62.04% average accuracies for the two setups, respectively. SSL methods, which utilize unlabeled data to improve performance, generally outperform DG methods. Notable among them are EntMin and FixMatch, with the latter achieving an average accuracy of 75.57% (10 labels) and 70.87% (5 labels). However, FreeMatch exhibits suboptimal adaptation, performing significantly worse with average accuracies of 57.13% and 42.75%, respectively. The SSDG methods, which combine the strengths of DG and SSL, deliver the best results. The proposed CAT (Ours) model achieves state-of-the-art performance, with an average accuracy of 82.95% in the 10-label setting and 82.71% in the 5-label setting. This represents significant improvements over the next-best method, StyleMatch, by 2.54% and 2.39%, respectively. These results underscore the effectiveness of CAT in leveraging both labeled and unlabeled data to handle domain shifts and achieve robust generalization. In summary, the table demonstrates that while DG methods struggle without unlabeled data and SSL methods falter under domain shifts, SSDG methods, particularly CAT, excel by addressing both challenges, achieving superior performance even in extreme low-data scenarios. We also show that CAT’s improvements are statistically significant at the p-value 0.05 level.

Download:

Table 5. Domain generalization results (%) in the low-data regime with a comparison of various models in different settings (fully labeled, DG, SSL, and SSDG), evaluated on PACS (Photo: P, Art: A, Cartoon: C, and Sketch: S). Here, u means utilization of unlabeled data. Results are reported as mean ± standard deviation over 5 random seeds.

https://doi.org/10.1371/journal.pone.0329799.t005

Results on OfficeHome. Table 6 provides a detailed comparison of model performance on the OfficeHome dataset in a low-data regime, evaluating models across various experimental settings (e.g. Full labels, DG, SSL, SSDG). The Full-Labels model, trained with fully labeled data, serves as the upper bound, achieving an average accuracy of 64.70% across domains. Among the DG methods, which generalize across domains without using unlabeled data, models such as Vanilla, CrossGrad, and RSC achieve average accuracies of around 57–58% in the 10-label setting and 52–53% in the 5-label setting. RSC and EISNet show slightly better performance due to their enhanced domain generalization capabilities. In contrast, SSL methods like MeanTeacher, EntMin, and FixMatch, which utilize both labeled and unlabeled data, outperform DG methods. For instance, FixMatch+RSC, which combines SSL and domain generalization, achieves average accuracies of 58.88% with 10-labels and 53.91% with 5-labels. On the other hand, SSDG methods, which integrate SSL and DG capabilities, deliver the highest performance across all metrics. Notably, the proposed CAT (Ours) model outperforms all other approaches, achieving an average accuracy of 65.04% in the 10-label setting and 61.71% in the 5-label setting. These results surpass the next-best SSDG method (MultiMatch) by 4.85% and 3.56%, respectively. The significant improvements of CAT highlight its ability to effectively leverage both labeled and unlabeled data while addressing domain shifts. In summary, the results demonstrate that DG methods effectively generalize across domains but fall short without access to unlabeled data. SSL methods improve performance by utilizing unlabeled data but do not account for domain shifts. SSDG methods, particularly CAT, combine the strengths of both approaches, achieving superior generalization and robustness in low-data scenarios. We also show that CAT’s improvements are statistically significant at the p-value 0.05 level.

Download:

Table 6. Domain generalization results (%) in the low-data regime with a comparison of various models in different settings (fully labeled, DG, SSL, and SSDG), evaluated on OfficeHome (Art: A, Clipart: C, Product: P, and Real-World: R). Here, u means utilization of unlabeled data. Results are reported as mean ± standard deviation over 5 runs.

https://doi.org/10.1371/journal.pone.0329799.t006

Results on miniDomainNet. Table 7 summarizes the results of different models evaluated on the miniDomainNet dataset under a low-data regime. The Full-Labels model achieves the best performance, setting an upper limit with average accuracies of 68.18% in the 10-label setting and 66.27% in the 5-label setting. These results highlight the optimal scenario where full supervision is available. Among SSDG methods, StyleMatch achieves average accuracies of 63.32% (10-label) and 61.26% (5-label), demonstrating its ability to leverage unlabeled data to address domain generalization. However, it is surpassed by MultiMatch, which improves the average accuracies to 64.55% and 63.70% for the two settings, respectively, indicating stronger capabilities to handle domain shifts. Our model significantly outperforms the other SSDG methods, achieving state-of-the-art average accuracies of 67.71% in the 10-label setting and 66.32% in the 5-label setting. These results closely approach the performance of the fully supervised Full-Labels model, demonstrating the model’s effectiveness in leveraging both labeled and unlabeled data. Compared to StyleMatch, CAT achieves a +4.39% improvement in the 10-label setting and a +5.06% improvement in the 5-label setting, while also outperforming MultiMatch by +3.16% and +2.62%, respectively. In conclusion, the results highlight the superior performance of CAT in addressing the challenges of domain generalization and limited labeled data. Its ability to achieve results comparable to the Full-Labels model makes it a robust solution for real-world low-data scenarios on the miniDomainNet dataset. We also show that CAT’s improvements are statistically significant at the p-value 0.05 level.

Download:

Table 7. Domain generalization results (%) in the low-data regime with a comparison of various models in different settings (fully labeled, DG, SSL, and SSDG), evaluated on miniDomainNet (Clipart: C, Painting: P, Real: R, and Sketch: S). Here, u means utilization of unlabeled data. Results are reported as mean ± standard deviation over 5 runs.

https://doi.org/10.1371/journal.pone.0329799.t007

Inference time. We assess the runtime efficiency of our proposed CAT and StyleMatch on the PACS dataset with a ResNet-18 backbone. We compared the models in RTX 3090 GPU with a batch size of 64. In Table 8, we see that CAT has a lower inference time and total test time. This shows our framework is more compatible with low-latency or real-time applications.

Download:

Table 8. Inference time comparison on the PACS dataset using ResNet-18 backbone and NVIDIA RTX 3090 GPU.

https://doi.org/10.1371/journal.pone.0329799.t008

5.2 Ablation studies

Effectiveness of different backbones. In Table 9, We compare both ResNet-18 and ResNet-50, CAT consistently outperforms both StyleMatch and MultiMatch across all domains and label settings. Specifically, when using ResNet-18, CAT achieves average performance scores of 82.95% and 82.71% for the 10-label and 5-label configurations, respectively. With ResNet-50, CAT performs even better, reaching average scores of 85.29% and 85.05% in the same two label settings. In comparison, StyleMatch shows competitive performance, but CAT consistently surpasses it, especially in the 10-label settings. For instance, with ResNet-50 and 10 labels per class, StyleMatch achieves an average score of 82.45%, while CAT achieves a significantly higher average of 85.29%. MultiMatch, while also competitive, does not match the performance of CAT in either backbone setting. Overall, the results suggest that the proposed CAT method is more effective in SSDG tasks than StyleMatch and MultiMatch. Moreover, the deeper ResNet-50 backbone outperforms the ResNet-18 backbone across both label configurations, indicating that a more complex network architecture benefits the performance of the models in this task.

Download:

Table 9. Backbone comparison of ResNet-18 and ResNet-50 in SSDG settings on PACS. Results are reported as mean ± standard deviation over 5 random seeds.

https://doi.org/10.1371/journal.pone.0329799.t009

Effect of different numbers of labels. In Fig 3, we conduct a comparison with different sets of label data to validate the performance of our method. We compare with two SSDG methods, such as StyleMatch, and MultiMatch. In every label set, our method outperforms both StyleMatch and MultiMatch. In all label settings, our method can improve performance by 1.5% than MultiMatch, which is better 1.5% better than StyleMatch. Hence, these results demonstrate its effectiveness even in a fully supervised setting.

Download:

Fig 3. Comparison between our method with StyleMatch and MultiMatch in different label settings.

https://doi.org/10.1371/journal.pone.0329799.g003

Effect of different numbers of source domains. In Table 10, we examine the impact of the number of sources (K) on the performance of three models—FixMatch, StyleMatch, and CAT (the proposed method)—on the PACS dataset, under two settings of label availability: 10 labels per class and 5 labels per class. The results, reported as accuracy percentages, highlight the influence of K (number of source domains) and the availability of labeled data on the models’ performance. The results reveal that increasing the number of sources (K) consistently improves accuracy across all models. For instance, FixMatch shows notable improvements as K increases from 1 to 3, but it lags behind StyleMatch and CAT in every configuration. StyleMatch demonstrates better utilization of domain information, consistently outperforming FixMatch across both label regimes. However, CAT significantly surpasses both FixMatch and StyleMatch in all scenarios, indicating its superior capability in leveraging both labeled and unlabeled data for domain generalization. With 10 labels per class, CAT achieves the highest accuracy, with 61.32% for K = 1, 78.92% for K = 2, and 82.95% for K = 3. Even in the low-data regime of 5 labels per class, CAT maintains its dominance, achieving 57.64% for K = 1, 74.26% for K = 2, and 82.71% for K = 3. These results highlight the model’s robustness and scalability, particularly as the number of source domains (K) increases. In summary, the findings demonstrate that CAT consistently outperforms FixMatch and StyleMatch, especially as the number of sources grows. Furthermore, it shows remarkable robustness in low-data scenarios, confirming its effectiveness in domain generalization tasks under varying conditions of labeled data availability.

Download:

Table 10. Impact on the number of sources (K) on the PACS dataset with varying label availability: 10 labels per class and 5 labels per class.

https://doi.org/10.1371/journal.pone.0329799.t010

6 Conclusion

In this work, we explore the challenging area of semi-supervised domain generalization (SSDG) to handle domain shifts under a low-data regime. In recent years, SSDG has become a more practical solution for many real-world applications. Hence, we propose CAT, an SSDG method that addresses the limitations of existing approaches by leveraging adaptive thresholding and noisy label refinement techniques to generate reliable pseudo labels and enhance generalization. By employing both global and local adaptive thresholds, our method ensures improved class diversity and dynamic confidence management in pseudo label generation. Additionally, the integration of supervised contrastive learning with refined pseudo-labels enables the model to capture domain-invariant representations effectively. Experimental results demonstrate the effectiveness of our method as an SSDG solution.

In future work, researchers can focus on the continual adaptation of domains under distribution change and label scarcity. Secondly, uncertainty-guided model generalization under severe domain shift can be explored. Thirdly, lightweight integration of vision language models in SSDG settings. And finally, generalization under the open-set setting is still an open challenge in SSDG settings.

References

1. Sanchez J, Deschaud JE, Goulette F. Domain generalization of 3D semantic segmentation in autonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023. p. 18077–87.
2. Yang F, Chen H, He Y, Zhao S, Zhang C, Ni K, et al. Geometry-guided domain generalization for monocular 3d object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 38; 2024. p. 6467–76.
3. Chen L, Zhang Y, Song Y, Van Den Hengel A, Liu L. Domain generalization via rationale invariance. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV). 2023. p. 1751–60. https://doi.org/10.1109/iccv51070.2023.00168
4. Learning SS. Semi-supervised learning. CSZ. 2006;5(2).
- View Article
- Google Scholar
5. Hady MFA, Schwenker F. Semi-supervised learning. Handbook on neural information processing. 2013. p. 215–39.
6. Zhu X, Goldberg AB. Introduction to semi-supervised learning. Springer; 2022.
7. Berthelot D, Carlini N, Goodfellow I, Papernot N, Oliver A, Raffel CA. Mixmatch: A holistic approach to semi-supervised learning. Advances in Neural Information Processing Systems. 2019;32.
- View Article
- Google Scholar
8. Yang X, Song Z, King I, Xu Z. A survey on deep semi-supervised learning. IEEE Trans Knowl Data Eng. 2023;35(9):8934–54.
- View Article
- Google Scholar
9. Lee DH, et al. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on challenges in representation learning, ICML, Atlanta, 2013. p. 896.
10. Bachman P, Alsharif O, Precup D. Learning with pseudo-ensembles. Advances in Neural Information Processing Systems. 2014;27.
- View Article
- Google Scholar
11. Cascante-Bonilla P, Tan F, Qi Y, Ordonez V. Curriculum labeling: Revisiting pseudo-labeling for semi-supervised learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2021. p. 6912–20.
12. Abuduweili A, Li X, Shi H, Xu CZ, Dou D. Adaptive consistency regularization for semi-supervised transfer learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. p. 6923–32.
13. Sohn K, Berthelot D, Carlini N, Zhang Z, Zhang H, Raffel CA. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Advances in Neural Information Processing Systems. 2020;33:596–608.
- View Article
- Google Scholar
14. Verma V, Kawaguchi K, Lamb A, Kannala J, Solin A, Bengio Y, et al. Interpolation consistency training for semi-supervised learning. Neural Netw. 2022;145:90–106. pmid:34735894
- View Article
- PubMed/NCBI
- Google Scholar
15. Zhang M, Marklund H, Dhawan N, Gupta A, Levine S, Finn C. Adaptive risk minimization: learning to adapt to domain shift. Advances in Neural Information Processing Systems. 2021;34:23664–78.
- View Article
- Google Scholar
16. Stacke K, Eilertsen G, Unger J, Lundstrom C. Measuring domain shift for deep learning in histopathology. IEEE J Biomed Health Inform. 2021;25(2):325–36. pmid:33085623
- View Article
- PubMed/NCBI
- Google Scholar
17. Chen Y, Wei C, Kumar A, Ma T. Self-training avoids using spurious features under domain shift. Advances in Neural Information Processing Systems. 2020;33:21061–71.
- View Article
- Google Scholar
18. Zhou K, Liu Z, Qiao Y, Xiang T, Loy CC. Domain generalization: a survey. IEEE Trans Pattern Anal Mach Intell. 2023;45(4):4396–415. pmid:35914036
- View Article
- PubMed/NCBI
- Google Scholar
19. Wang J, Lan C, Liu C, Ouyang Y, Qin T, Lu W, et al. Generalizing to unseen domains: a survey on domain generalization. IEEE Trans Knowl Data Eng. 2022;1.
- View Article
- Google Scholar
20. Zhou K, Yang Y, Qiao Y, Xiang T. Domain adaptive ensemble learning. IEEE Trans Image Process. 2021;30:8008–18. pmid:34534081
- View Article
- PubMed/NCBI
- Google Scholar
21. Li D, Yang Y, Song YZ, Hospedales TM. Deeper, broader and artier domain generalization. In: Proceedings of the IEEE International Conference on Computer Vision. 2017. p. 5542–50.
22. Zhou K, Loy CC, Liu Z. Semi-supervised domain generalization with stochastic StyleMatch. Int J Comput Vis. 2023;131(9):2377–87.
- View Article
- Google Scholar
23. Wang Y, Chen H, Heng Q, Hou W, Fan Y, Wu Z. Freematch: self-adaptive thresholding for semi-supervised learning. arXiv preprint 2022.
- View Article
- Google Scholar
24. Guo LZ, Li YF. Class-imbalanced semi-supervised learning with adaptive thresholding. In: International Conference on Machine Learning. PMLR; 2022. p. 8082–94.
25. Xu Y, Shang L, Ye J, Qian Q, Li YF, Sun B. Dash: Semi-supervised learning with dynamic thresholding. In: International Conference on Machine Learning. PMLR; 2021. p. 11525–36.
26. Berthelot D, Roelofs R, Sohn K, Carlini N, Kurakin A. Adamatch: a unified approach to semi-supervised learning and domain adaptation. 2021. https://arxiv.org/abs/2106.04732
27. Yao X, Bai Y, Zhang X, Zhang Y, Sun Q, Chen R. PCL: Proxy-based contrastive learning for domain generalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. p. 7097–107.
28. Muandet K, Balduzzi D, Schölkopf B. Domain generalization via invariant feature representation. In: International Conference on Machine Learning. PMLR; 2013. p. 10–8.
29. Li H, Wang Y, Wan R, Wang S, Li TQ, Kot A. Domain generalization for medical imaging classification with linear-dependency regularization. Advances in neural information processing systems. 2020;33:3118–29.
- View Article
- Google Scholar
30. Li Y, Gong M, Tian X, Liu T, Tao D. Domain generalization via conditional invariant representations. AAAI. 2018;32(1).
- View Article
- Google Scholar
31. Balaji Y, Sankaranarayanan S, Chellappa R. Metareg: towards domain generalization using meta-regularization. Advances in neural information processing systems. 2018;31.
- View Article
- Google Scholar
32. Li Y, Yang Y, Zhou W, Hospedales T. Feature-critic networks for heterogeneous domain generalization. In: International Conference on Machine Learning. PMLR; 2019. p. 3915–24.
33. Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F. Domain-adversarial training of neural networks. Journal of Machine Learning Research. 2016;17(59):1–35.
- View Article
- Google Scholar
34. Zhou K, Yang Y, Qiao Y, Xiang T. Mixstyle neural networks for domain generalization and adaptation. International Journal of Computer Vision. 2024;132(3):822–36.
- View Article
- Google Scholar
35. Mancini M, Akata Z, Ricci E, Caputo B. Towards recognizing unseen categories in unseen domains. In: European Conference on Computer Vision. Springer; 2020. p. 466–83.
36. Xu Z, Liu D, Yang J, Raffel C, Niethammer M. Robust and generalizable visual representation learning via random convolutions. arXiv preprint 2020. https://arxiv.org/abs/2007.13003
- View Article
- Google Scholar
37. Ding Z, Fu Y. Deep domain generalization with structured low-rank constraint. IEEE Trans Image Process. 2018;27(1):304–13. pmid:28976316
- View Article
- PubMed/NCBI
- Google Scholar
38. D’Innocente A, Caputo B. Domain generalization with domain-specific aggregation modules. In: Pattern Recognition: 40th German Conference, GCPR 2018, Stuttgart, Germany, October 9-12, 2018, Proceedings 40. Springer; 2019. p. 187–98.
39. Seo S, Suh Y, Kim D, Kim G, Han J, Han B. Learning to optimize domain specific normalization for domain generalization. In: Computer Vision–ECCV 2020 : 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXII 16. Springer; 2020. p. 68–83.
40. Mancini M, Bulo SR, Caputo B, Ricci E. Robust place categorization with deep domain generalization. IEEE Robotics and Automation Letters. 2018;3(3):2093–100.
- View Article
- Google Scholar
41. Noroozi M, Favaro P. Unsupervised learning of visual representations by solving jigsaw puzzles. In: European Conference on Computer Vision. Springer; 2016. p. 69–84.
42. Gidaris S, Singh P, Komodakis N. Unsupervised representation learning by predicting image rotations. arXiv preprint 2018. https://arxiv.org/abs/1803.07728
- View Article
- Google Scholar
43. Kim D, Yoo Y, Park S, Kim J, Lee J. Selfreg: self-supervised contrastive regularization for domain generalization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021. p. 9619–28.
44. Cha J, Lee K, Park S, Chun S. Domain generalization by mutual-information regularization with pre-trained models. In: European Conference on Computer Vision. Springer; 2022. p. 440–57.
45. Tai KS, Bailis PD, Valiant G. Sinkhorn label allocation: Semi-supervised classification via annealed self-training. In: International Conference on Machine Learning. PMLR; 2021. p. 10065–75.
46. Tarvainen A, Valpola H. Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. Advances in neural information processing systems. 2017;30.
- View Article
- Google Scholar
47. Ke Z, Wang D, Yan Q, Ren J, Lau RW. Dual student: breaking the limits of the teacher in semi-supervised learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision; 2019. p. 6728–36.
48. Luo Y, Zhu J, Li M, Ren Y, Zhang B. Smooth neighbors on teacher graphs for semi-supervised learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8896–905.
49. Grandvalet Y, Bengio Y. Semi-supervised learning by entropy minimization. Advances in Neural Information Processing Systems. 2004;17.
- View Article
- Google Scholar
50. Xie Q, Dai Z, Hovy E, Luong T, Le Q. Unsupervised data augmentation for consistency training. Advances in Neural Information Processing Systems. 2020;33:6256–68.
- View Article
- Google Scholar
51. Chen B, Jiang J, Wang X, Wan P, Wang J, Long M. Debiased self-training for semi-supervised learning. Advances in Neural Information Processing Systems. 2022;35:32424–37.
- View Article
- Google Scholar
52. Zhao Z, Zhou L, Wang L, Shi Y, Gao Y. LaSSL: label-guided self-training for semi-supervised learning. AAAI. 2022;36(8):9208–16.
- View Article
- Google Scholar
53. Wei C, Sohn K, Mellina C, Yuille A, Yang F. Crest: a class-rebalancing self-training framework for imbalanced semi-supervised learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. p. 10857–66.
54. Amini M-R, Feofanov V, Pauletto L, Hadjadj L, Devijver É, Maximov Y. Self-training: a survey. Neurocomputing. 2025;616:128904.
- View Article
- Google Scholar
55. Zhang L, Li JF, Wang W. Semi-supervised domain generalization with known and unknown classes. Advances in Neural Information Processing Systems. 2023;36:28735–47.
- View Article
- Google Scholar
56. Qi L, Yang H, Shi Y, Geng X. MultiMatch: multi-task learning for semi-supervised domain generalization. ACM Trans Multimedia Comput Commun Appl. 2024;20(6):1–21.
- View Article
- Google Scholar
57. Jayanaga Galappaththige C, Izzo Z, He X, Zhou H, Haris Khan M. Domain-guided weight modulation for semi-supervised domain generalization. arXiv preprint 2024. p. 2409.
- View Article
- Google Scholar
58. Cubuk ED, Zoph B, Shlens J, Le QV. Randaugment: practical automated data augmentation with a reduced search space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops; 2020. p. 702–3.
59. Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P. Supervised contrastive learning. Advances in Neural Information Processing Systems. 2020;33:18661–73.
- View Article
- Google Scholar
60. Ortego D, Arazo E, Albert P, O’Connor NE, McGuinness K. Multi-objective interpolation training for robustness to label noise. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2021. p. 6606–15.
61. Li S, Xia X, Ge S, Liu T. Selective-supervised contrastive learning with noisy labels. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2022. p. 316–25. https://doi.org/10.1109/cvpr52688.2022.00041
62. Venkateswara H, Eusebio J, Chakraborty S, Panchanathan S. Deep hashing network for unsupervised domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. p. 5018–27.
63. Torralba A, Efros AA. Unbiased look at dataset bias. In: CVPR 2011 . IEEE; 2011. p. 1521–8.
64. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. p. 770–8.
65. Shankar S, Piratla V, Chakrabarti S, Chaudhuri S, Jyothi P, Sarawagi S. Generalizing across domains via cross-gradient training. arXiv preprint 2018.
- View Article
- Google Scholar
66. Zhou K, Yang Y, Hospedales T, Xiang T. Deep domain-adversarial image generation for domain generalisation. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 34; 2020. p. 13025–32.
67. Huang Z, Wang H, Xing EP, Huang D. Self-challenging improves cross-domain generalization. In: Computer Vision–ECCV 2020 : 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16. Springer; 2020. p. 124–40.
68. Wang S, Yu L, Li C, Fu CW, Heng PA. Learning from extrinsic and intrinsic supervisions for domain generalization. In: European Conference on Computer Vision. Springer; 2020. p. 159–76.

[ref1] 1. Sanchez J, Deschaud JE, Goulette F. Domain generalization of 3D semantic segmentation in autonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023. p. 18077–87.

[ref2] 2. Yang F, Chen H, He Y, Zhao S, Zhang C, Ni K, et al. Geometry-guided domain generalization for monocular 3d object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 38; 2024. p. 6467–76.

[ref3] 3. Chen L, Zhang Y, Song Y, Van Den Hengel A, Liu L. Domain generalization via rationale invariance. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV). 2023. p. 1751–60. https://doi.org/10.1109/iccv51070.2023.00168

[ref4] 4. Learning SS. Semi-supervised learning. CSZ. 2006;5(2).
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref5] 5. Hady MFA, Schwenker F. Semi-supervised learning. Handbook on neural information processing. 2013. p. 215–39.

[ref6] 6. Zhu X, Goldberg AB. Introduction to semi-supervised learning. Springer; 2022.

[ref7] 7. Berthelot D, Carlini N, Goodfellow I, Papernot N, Oliver A, Raffel CA. Mixmatch: A holistic approach to semi-supervised learning. Advances in Neural Information Processing Systems. 2019;32.
View Article
Google Scholar

[10] View Article

[11] Google Scholar

[ref8] 8. Yang X, Song Z, King I, Xu Z. A survey on deep semi-supervised learning. IEEE Trans Knowl Data Eng. 2023;35(9):8934–54.
View Article
Google Scholar

[13] View Article

[14] Google Scholar

[ref9] 9. Lee DH, et al. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on challenges in representation learning, ICML, Atlanta, 2013. p. 896.

[ref10] 10. Bachman P, Alsharif O, Precup D. Learning with pseudo-ensembles. Advances in Neural Information Processing Systems. 2014;27.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref11] 11. Cascante-Bonilla P, Tan F, Qi Y, Ordonez V. Curriculum labeling: Revisiting pseudo-labeling for semi-supervised learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2021. p. 6912–20.

[ref12] 12. Abuduweili A, Li X, Shi H, Xu CZ, Dou D. Adaptive consistency regularization for semi-supervised transfer learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. p. 6923–32.

[ref13] 13. Sohn K, Berthelot D, Carlini N, Zhang Z, Zhang H, Raffel CA. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Advances in Neural Information Processing Systems. 2020;33:596–608.
View Article
Google Scholar

[22] View Article

[23] Google Scholar

[ref14] 14. Verma V, Kawaguchi K, Lamb A, Kannala J, Solin A, Bengio Y, et al. Interpolation consistency training for semi-supervised learning. Neural Netw. 2022;145:90–106. pmid:34735894
View Article
PubMed/NCBI
Google Scholar

[25] View Article

[26] PubMed/NCBI

[27] Google Scholar

[ref15] 15. Zhang M, Marklund H, Dhawan N, Gupta A, Levine S, Finn C. Adaptive risk minimization: learning to adapt to domain shift. Advances in Neural Information Processing Systems. 2021;34:23664–78.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref16] 16. Stacke K, Eilertsen G, Unger J, Lundstrom C. Measuring domain shift for deep learning in histopathology. IEEE J Biomed Health Inform. 2021;25(2):325–36. pmid:33085623
View Article
PubMed/NCBI
Google Scholar

[32] View Article

[33] PubMed/NCBI

[34] Google Scholar

[ref17] 17. Chen Y, Wei C, Kumar A, Ma T. Self-training avoids using spurious features under domain shift. Advances in Neural Information Processing Systems. 2020;33:21061–71.
View Article
Google Scholar

[36] View Article

[37] Google Scholar

[ref18] 18. Zhou K, Liu Z, Qiao Y, Xiang T, Loy CC. Domain generalization: a survey. IEEE Trans Pattern Anal Mach Intell. 2023;45(4):4396–415. pmid:35914036
View Article
PubMed/NCBI
Google Scholar

[39] View Article

[40] PubMed/NCBI

[41] Google Scholar

[ref19] 19. Wang J, Lan C, Liu C, Ouyang Y, Qin T, Lu W, et al. Generalizing to unseen domains: a survey on domain generalization. IEEE Trans Knowl Data Eng. 2022;1.
View Article
Google Scholar

[43] View Article

[44] Google Scholar

[ref20] 20. Zhou K, Yang Y, Qiao Y, Xiang T. Domain adaptive ensemble learning. IEEE Trans Image Process. 2021;30:8008–18. pmid:34534081
View Article
PubMed/NCBI
Google Scholar

[46] View Article

[47] PubMed/NCBI

[48] Google Scholar

[ref21] 21. Li D, Yang Y, Song YZ, Hospedales TM. Deeper, broader and artier domain generalization. In: Proceedings of the IEEE International Conference on Computer Vision. 2017. p. 5542–50.

[ref22] 22. Zhou K, Loy CC, Liu Z. Semi-supervised domain generalization with stochastic StyleMatch. Int J Comput Vis. 2023;131(9):2377–87.
View Article
Google Scholar

[51] View Article

[52] Google Scholar

[ref23] 23. Wang Y, Chen H, Heng Q, Hou W, Fan Y, Wu Z. Freematch: self-adaptive thresholding for semi-supervised learning. arXiv preprint 2022.
View Article
Google Scholar

[54] View Article

[55] Google Scholar

[ref24] 24. Guo LZ, Li YF. Class-imbalanced semi-supervised learning with adaptive thresholding. In: International Conference on Machine Learning. PMLR; 2022. p. 8082–94.

[ref25] 25. Xu Y, Shang L, Ye J, Qian Q, Li YF, Sun B. Dash: Semi-supervised learning with dynamic thresholding. In: International Conference on Machine Learning. PMLR; 2021. p. 11525–36.

[ref26] 26. Berthelot D, Roelofs R, Sohn K, Carlini N, Kurakin A. Adamatch: a unified approach to semi-supervised learning and domain adaptation. 2021. https://arxiv.org/abs/2106.04732

[ref27] 27. Yao X, Bai Y, Zhang X, Zhang Y, Sun Q, Chen R. PCL: Proxy-based contrastive learning for domain generalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. p. 7097–107.

[ref28] 28. Muandet K, Balduzzi D, Schölkopf B. Domain generalization via invariant feature representation. In: International Conference on Machine Learning. PMLR; 2013. p. 10–8.

[ref29] 29. Li H, Wang Y, Wan R, Wang S, Li TQ, Kot A. Domain generalization for medical imaging classification with linear-dependency regularization. Advances in neural information processing systems. 2020;33:3118–29.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref30] 30. Li Y, Gong M, Tian X, Liu T, Tao D. Domain generalization via conditional invariant representations. AAAI. 2018;32(1).
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref31] 31. Balaji Y, Sankaranarayanan S, Chellappa R. Metareg: towards domain generalization using meta-regularization. Advances in neural information processing systems. 2018;31.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref32] 32. Li Y, Yang Y, Zhou W, Hospedales T. Feature-critic networks for heterogeneous domain generalization. In: International Conference on Machine Learning. PMLR; 2019. p. 3915–24.

[ref33] 33. Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F. Domain-adversarial training of neural networks. Journal of Machine Learning Research. 2016;17(59):1–35.
View Article
Google Scholar

[72] View Article

[73] Google Scholar

[ref34] 34. Zhou K, Yang Y, Qiao Y, Xiang T. Mixstyle neural networks for domain generalization and adaptation. International Journal of Computer Vision. 2024;132(3):822–36.
View Article
Google Scholar

[75] View Article

[76] Google Scholar

[ref35] 35. Mancini M, Akata Z, Ricci E, Caputo B. Towards recognizing unseen categories in unseen domains. In: European Conference on Computer Vision. Springer; 2020. p. 466–83.

[ref36] 36. Xu Z, Liu D, Yang J, Raffel C, Niethammer M. Robust and generalizable visual representation learning via random convolutions. arXiv preprint 2020. https://arxiv.org/abs/2007.13003
View Article
Google Scholar

[79] View Article

[80] Google Scholar

[ref37] 37. Ding Z, Fu Y. Deep domain generalization with structured low-rank constraint. IEEE Trans Image Process. 2018;27(1):304–13. pmid:28976316
View Article
PubMed/NCBI
Google Scholar

[82] View Article

[83] PubMed/NCBI

[84] Google Scholar

[ref38] 38. D’Innocente A, Caputo B. Domain generalization with domain-specific aggregation modules. In: Pattern Recognition: 40th German Conference, GCPR 2018, Stuttgart, Germany, October 9-12, 2018, Proceedings 40. Springer; 2019. p. 187–98.

[ref39] 39. Seo S, Suh Y, Kim D, Kim G, Han J, Han B. Learning to optimize domain specific normalization for domain generalization. In: Computer Vision–ECCV 2020 : 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXII 16. Springer; 2020. p. 68–83.

[ref40] 40. Mancini M, Bulo SR, Caputo B, Ricci E. Robust place categorization with deep domain generalization. IEEE Robotics and Automation Letters. 2018;3(3):2093–100.
View Article
Google Scholar

[88] View Article

[89] Google Scholar

[ref41] 41. Noroozi M, Favaro P. Unsupervised learning of visual representations by solving jigsaw puzzles. In: European Conference on Computer Vision. Springer; 2016. p. 69–84.

[ref42] 42. Gidaris S, Singh P, Komodakis N. Unsupervised representation learning by predicting image rotations. arXiv preprint 2018. https://arxiv.org/abs/1803.07728
View Article
Google Scholar

[92] View Article

[93] Google Scholar

[ref43] 43. Kim D, Yoo Y, Park S, Kim J, Lee J. Selfreg: self-supervised contrastive regularization for domain generalization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021. p. 9619–28.

[ref44] 44. Cha J, Lee K, Park S, Chun S. Domain generalization by mutual-information regularization with pre-trained models. In: European Conference on Computer Vision. Springer; 2022. p. 440–57.

[ref45] 45. Tai KS, Bailis PD, Valiant G. Sinkhorn label allocation: Semi-supervised classification via annealed self-training. In: International Conference on Machine Learning. PMLR; 2021. p. 10065–75.

[ref46] 46. Tarvainen A, Valpola H. Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. Advances in neural information processing systems. 2017;30.
View Article
Google Scholar

[98] View Article

[99] Google Scholar

[ref47] 47. Ke Z, Wang D, Yan Q, Ren J, Lau RW. Dual student: breaking the limits of the teacher in semi-supervised learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision; 2019. p. 6728–36.

[ref48] 48. Luo Y, Zhu J, Li M, Ren Y, Zhang B. Smooth neighbors on teacher graphs for semi-supervised learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8896–905.

[ref49] 49. Grandvalet Y, Bengio Y. Semi-supervised learning by entropy minimization. Advances in Neural Information Processing Systems. 2004;17.
View Article
Google Scholar

[103] View Article

[104] Google Scholar

[ref50] 50. Xie Q, Dai Z, Hovy E, Luong T, Le Q. Unsupervised data augmentation for consistency training. Advances in Neural Information Processing Systems. 2020;33:6256–68.
View Article
Google Scholar

[106] View Article

[107] Google Scholar

[ref51] 51. Chen B, Jiang J, Wang X, Wan P, Wang J, Long M. Debiased self-training for semi-supervised learning. Advances in Neural Information Processing Systems. 2022;35:32424–37.
View Article
Google Scholar

[109] View Article

[110] Google Scholar

[ref52] 52. Zhao Z, Zhou L, Wang L, Shi Y, Gao Y. LaSSL: label-guided self-training for semi-supervised learning. AAAI. 2022;36(8):9208–16.
View Article
Google Scholar

[112] View Article

[113] Google Scholar

[ref53] 53. Wei C, Sohn K, Mellina C, Yuille A, Yang F. Crest: a class-rebalancing self-training framework for imbalanced semi-supervised learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. p. 10857–66.

[ref54] 54. Amini M-R, Feofanov V, Pauletto L, Hadjadj L, Devijver É, Maximov Y. Self-training: a survey. Neurocomputing. 2025;616:128904.
View Article
Google Scholar

[116] View Article

[117] Google Scholar

[ref55] 55. Zhang L, Li JF, Wang W. Semi-supervised domain generalization with known and unknown classes. Advances in Neural Information Processing Systems. 2023;36:28735–47.
View Article
Google Scholar

[119] View Article

[120] Google Scholar

[ref56] 56. Qi L, Yang H, Shi Y, Geng X. MultiMatch: multi-task learning for semi-supervised domain generalization. ACM Trans Multimedia Comput Commun Appl. 2024;20(6):1–21.
View Article
Google Scholar

[122] View Article

[123] Google Scholar

[ref57] 57. Jayanaga Galappaththige C, Izzo Z, He X, Zhou H, Haris Khan M. Domain-guided weight modulation for semi-supervised domain generalization. arXiv preprint 2024. p. 2409.
View Article
Google Scholar

[125] View Article

[126] Google Scholar

[ref58] 58. Cubuk ED, Zoph B, Shlens J, Le QV. Randaugment: practical automated data augmentation with a reduced search space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops; 2020. p. 702–3.

[ref59] 59. Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P. Supervised contrastive learning. Advances in Neural Information Processing Systems. 2020;33:18661–73.
View Article
Google Scholar

[129] View Article

[130] Google Scholar

[ref60] 60. Ortego D, Arazo E, Albert P, O’Connor NE, McGuinness K. Multi-objective interpolation training for robustness to label noise. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2021. p. 6606–15.

[ref61] 61. Li S, Xia X, Ge S, Liu T. Selective-supervised contrastive learning with noisy labels. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2022. p. 316–25. https://doi.org/10.1109/cvpr52688.2022.00041

[ref62] 62. Venkateswara H, Eusebio J, Chakraborty S, Panchanathan S. Deep hashing network for unsupervised domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. p. 5018–27.

[ref63] 63. Torralba A, Efros AA. Unbiased look at dataset bias. In: CVPR 2011 . IEEE; 2011. p. 1521–8.

[ref64] 64. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. p. 770–8.

[ref65] 65. Shankar S, Piratla V, Chakrabarti S, Chaudhuri S, Jyothi P, Sarawagi S. Generalizing across domains via cross-gradient training. arXiv preprint 2018.
View Article
Google Scholar

[137] View Article

[138] Google Scholar

[ref66] 66. Zhou K, Yang Y, Hospedales T, Xiang T. Deep domain-adversarial image generation for domain generalisation. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 34; 2020. p. 13025–32.

[ref67] 67. Huang Z, Wang H, Xing EP, Huang D. Self-challenging improves cross-domain generalization. In: Computer Vision–ECCV 2020 : 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16. Springer; 2020. p. 124–40.

[ref68] 68. Wang S, Yu L, Li C, Fu CW, Heng PA. Learning from extrinsic and intrinsic supervisions for domain generalization. In: European Conference on Computer Vision. Springer; 2020. p. 159–76.

Figures

Abstract

1 Introduction

2 Related works

3 Method

3.1 Notation and preliminaries

3.2 Class-domain aware thresholding

3.3 Refining noisy pseudo labels

4 Experimental settings

4.1 Datasets

4.2 Implementation details

5 Experimental results

5.1 Comparison with state-of-the-art methods

5.2 Ablation studies

6 Conclusion

References