Figures
Abstract
Domain generalization seeks to acquire knowledge from limited source data and apply it to an unknown target domain. Current approaches primarily tackle this challenge by attempting to eliminate the differences between domains. However, as cross-domain data evolves, the discrepancies between domains grow increasingly intricate and difficult to manage, rendering effective knowledge transfer across multiple domains a persistent challenge. While existing methods concentrate on minimizing domain discrepancies, they frequently encounter difficulties in maintaining effectiveness when confronted with high data complexity. In this paper, we present an approach that transcends merely eliminating domain discrepancies by enhancing the model’s adaptability to improve its performance in unseen domains. Specifically, we frame the problem as an optimization process with the objective of minimizing a weighted loss function that balances cross-domain discrepancies and sample complexity. Our proposed self-ensemble learning framework, which utilizes a single feature extractor, simplifies this process by alternately training multiple classifiers with shared feature extractors. The introduction of focal loss and complex sample loss weight further fine-tunes the model’s sensitivity to hard-to-learn instances, enhancing generalization to difficult samples. Finally, a dynamic loss adaptive weighted voting strategy ensures more accurate predictions across diverse domains. Experimental results on three public benchmark datasets (OfficeHome, PACS, and VLCS) demonstrate that our proposed algorithm achieves an improvement of up to 3 . 38% over existing methods in terms of generalization performance, particularly in complex and diverse real-world scenarios, such as autonomous driving and medical image analysis. These results highlight the practical utility of our approach in environments where cross-domain generalization is crucial for system reliability and safety.
Citation: Qin Z, Guo X, Li J, Chen Y (2025) Domain generalization for image classification based on simplified self ensemble learning. PLoS ONE 20(4): e0320300. https://doi.org/10.1371/journal.pone.0320300
Editor: Xu Yanwu
Received: August 1, 2024; Accepted: February 15, 2025; Published: April 4, 2025
Copyright: © 2025 Qin et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All data files are available from the Figshare database (https:// figshare.com/s/bb6260738c65a19f5cd0).
Funding: The project numbered 2024KY0906 sponsored by Guangxi Education Department has furnished computing power support for this research (The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript).
Competing interests: The authors have declared that no competing interests exist.
Introduction
At present, deep learning research is developing rapidly, but its application in industry is limited. Since deep learning relies heavily on an oversimplified assumption that the training and test domain data are independent and identically distributed. However, domain shift [1,2] in cross-domain transfers causes models with excellent training performance to fail in real-world scenarios. Therefore, more and more research focuses on solving this problem, especially in critical applications such as autonomous driving [3–5], medical image analysis [6–8] and seismic signals classification [9,10].
From a statistical point of view, a model is obtained by establishing a conditional probability distribution over a sample spatial distribution. When the input space’s marginal probability distribution and the task’s conditional probability distribution change, it will cause a noticeable shift in data distribution between the target and source domain. This change is called a domain shift. Domain transfer describes a problem of uneven data distribution in the source and target domains [11,12].
To address the impact of the domain shift problem, domain generalization [13–15] has become a research hotspot. More and more work tries to design unique learning methods or make better model selections to achieve higher performance. Domain invariant feature learning methods [16–18] aim to identify the invariance of associations within training domain data. The stable learning method [19–21] examines the impact of sample weights on model stability. Among them, the ensemble learning [22–24] method stands out and has become a mainstream research method of domain generalization. Ensemble learning can bypass the domain transfer problem without resorting to more data, but it also brings some problems. For example, ensemble learning is more dependent on the number of models and the quality of data, which requires many resources during training and cannot be used when computer resources are limited [25,26]. In addition, ensemble learning is not biased towards designing methods to solve domain transfer but votes through the outputs of multiple models, which is not conducive to few-shot tasks and improves the generalization performance of a single model [27,28].
In this paper, we propose a simplified self-ensemble learning method based on shared feature extractors, which requires only one model and uses different classifiers for better generalization. First, we propose using the same feature extraction network to train multiple classifiers alternately so that different classifiers learn different data and establish differences between them. Second, we introduce focal loss [29] and loss weighting methods to force the model to focus on more complex samples and learn domain-independent features of both complex and simple samples. Finally, weighted voting is performed on the prediction results of multiple classifiers according to the training accuracy weight of the classifier so that the model outputs a prediction result with the highest probability after weighted voting. Three benchmark datasets representing domain discrepancy are used for local experiments under the same hyperparameter settings. First, we design a simplified self-ensemble learning framework consisting of only one shared encoder and multiple classifiers of the same type without the need to save additional copies of the model or train specific networks for different domains during training. Therefore, low consumption of computing resources by the simplified model is guaranteed. Second, the introduction of the well-proven focal loss enables the network to focus on mining domain-invariant representations of complex samples and improves the model’s generalization performance in a targeted manner. Finally, the dynamic loss weighted voting strategy is used to dynamically adjust the respective weights of the prediction results of multiple classifiers according to the size of the training loss in this round so that the classifiers with better performance get greater weights, and the classifiers with different learning abilities are improved—the problem of deciding the prediction result according to the same weight. Our contributions can be summarized as follows:
- For the first time, we propose a simplified self-ensemble learning framework that only uses shared feature extractors and multiple classifiers, which significantly reduces computer resource consumption and avoids the problem that ensemble learning cannot be used due to resource constraints.
- We design a novel voting strategy that redistributes weights based on the predictive performance of each classifier before voting and uses the average result of weighted summation as the final output.
- We propose to use focal loss and cross-entropy to weigh the loss of complex samples to improve the model’s learning ability for complex samples, making it more conducive to learning domain-invariant representations.
- We demonstrate the reliability of the proposed algorithm by thoroughly investigating the domain generalization problem under natural and synthetic images on three benchmark datasets with different settings.
The Introduction section outlines the research background and objectives of the Simplified Self-Ensemble Learning (SSEL) approach. The Related Works section reviews theoretical work on domain generalization and ensemble learning. The Methods section presents the research problem, network model design, loss function construction, and training strategy. The Experiments section details the dataset, experimental baselines, and implementation. The Results section focuses on experimental results, while Analysis Section analyzes these results and related findings. Finally, Conclusion section concludes with key findings and future research directions.
Related works
Domain generalization
The goal of domain generalization is to generalize a model learned from a source domain to an arbitrary new target domain. The training and testing domains do not belong to the same domain. Unlike the domain adaptation problem, domain generalization does not give a specific target domain. Domain generalization aims to have a robust generalization performance when facing any unknown target domain. The current popular domain generalization methods include domain-invariant representation learning, data augmentation, meta-learning, and regularization. Methods based on domain-invariant representation learning aim to use models to learn domain-invariant features that are not perturbed by domain-specific representations, and typical approaches use adversarial learning to try to separate domain-invariant and domain-specific representations. For example, Nguyen et al. [30] proposed to learn domain-invariant representations using domain density transformation theory to force the representation network to remain invariant under transformation functions in arbitrary domains. Hu et al. [31]proposed to achieve domain-specific feature adversarial decoupling through maximum-minimum disentanglement to learn domain-invariant representations in the task of person re-identification. Hou et al. [32] designed a BatchFormer module to learn sample associations in each mini batch.Lee et al. [33] proposed Multi-EPL, a multi-source domain adaptation method that uses label-wise moment matching and an ensemble of feature extractors to learn robust, domain-invariant representations. The method based on data augmentation enables the model to learn a broader range of features by enhancing the feature diversity of the existing data and alleviates the problem of poor model generalization ability due to the limitation of data diversity. Mancini et al. [34] proposed to mix images and features for data augmentation during training simultaneously. Li et al. [35] proposed a randomized feature enhancement method based on Gaussian noise perturbation features to train the generalization model. Xu et al. [36] proposed data augmentation by linearly interpolating between two images’ Fourier transform magnitude spectra.Jiang et al. [37] proposed MeshCut, a data augmentation technique utilizing mesh masks to diversify image features and enhance model generalization. Recently meta-learning-based methods were introduced into DG. The meta-learning involved in DG mainly simulates the distribution difference between the training domain and the test domain by constructing the meta-training domain and the meta-test domain during the model training process so that the model actively learns how to deal with the domain transfer in the unknown domain. For example, Finn et al. [38] proposed to use the first derivative method to compute meta-learning. Li et al. [39] trained the network by simulating meta-training data and meta-testing data. Shu et al. [40] proposed a new Dir-mixup method and distilled soft-labeling to enhance each domain and perform meta-learning accordingly. Zhao et al. [41] proposed a memory-based multi-source meta-learning strategy to solve the person re-identification problem.Xue et al. [42] leveraged meta-learning for efficient GFRP column data modeling with limited samples. Methods based on regularization strategies are often used in conjunction with other methods, such as meta-learninget al. [43,44], data augmentationet al. [45,46], and domain-invariant representation learninget al. [47,48].
Ensemble learning
Ensemble learning has always been a hot topic in machine learning. Ensemble learning methods usually employ a combination of multiple copies of the same model with different weights or training data for ensemble prediction. This straightforward technique is very effective in improving model generalization [49,50]. The mainstream methods for applying ensemble learning in DG problems include ensemble SVM classifiers [51–53], domain-specific networks [54–56], and weight averages [57]. However, integrating SVM classifiers requires integrating different types of classifiers. Domain-specific networks require learning a separately used specific network for each domain. Weight averaging requires training multiple copies of the model and then averaging the weights of different models before merging. These existing methods have problems such as excessive resource consumption, repeated training, and not paying attention to complex local samples. Our proposed method does not need to train multiple copies repeatedly, only uses one encoder and multiple classifiers of the same type to achieve ensemble learning, and focuses on the case of locally complex samples.
Methods
Problem definition
Given X be the input feature space and Y be the label space. The domain is defined as the joint probability distribution over X × Y. For a given
, we denote the marginal probability distribution over X by
. Given X,
represents the posterior probability distribution of Y.
represents the posterior probability distribution of Y given X.
represents the conditional probability distribution of X given Y. The deep learning model is defined as F : X → Y. In domain generalization, we assume that N different source domains are accessible. Given the source domain as
and the target domain by
. Each source domain obeys a joint probability distribution by
. In general,
,
,
. The labels of the source domain are accessible during model training, but the labels of the target domain are not. The goal of DG methods is to learn a model F with well-generalized ability on N visible source domains, which can achieve minimum prediction error on unseen target domains, It can be expressed as:
where E denotes the expectations, L ( ⋅ ) represents the loss function.
Model structure
Fig 1 shows our proposed model framework. The network consists of an encoder (Encoder) and n classifiers () and convolution modules belonging to different classifiers. Among them, the encoder is shared by the classifiers, and different classifiers learn different parameters by setting the random training convolution module during training. In theory, the encoder can be substituted with various feature extraction networks, including ResNet, DenseNet, EfficientNet, MobileNetV2, RegNet, and Vision Transformer, among others. This paper consistently adheres to the experimental settings established by DomainBed and employs ResNet-18 and ResNet-50 as encoders for the subsequent experiments. The number of classifiers can be set arbitrarily and within a limited range. The effect of ensemble learning will increase with the increase in the number of classifiers. Similar to the principle of diminishing marginal returns, having an excessive number of classifiers can adversely affect the model’s performance. It is worth noting that, as in other benchmark experiments, the classifier used in this paper is a single-layer fully connected network.
C ( ⋅ ) denotes different independent classifier heads. W represent the process of dynamic loss weighting.
The core of this network structure has two parts: (1) Our proposed simplified self-ensemble learning framework only require one encoder and multiple classifiers with convolutional modules to achieve self-ensemble learning and does not require additional time to train multiple replica models or save different iteration cycles. Model parameters are used for parameter averaging. (2) We design a new dynamic loss adaptive weighted voting strategy, which can reasonably assign weights according to the classifier’s performance through the iterative loss of the classifier without introducing additional artificial hyperparameters. Algorithm 1 provides the detailed training process of our methods.
To implement the proposed method, the training process begins by initializing the encoder and classifiers with random weights. During each training epoch, a mini-batch of data is passed through the shared encoder to extract feature embeddings, which are then input to each classifier to generate predictions. The loss for each classifier is computed as a weighted combination of cross-entropy loss and focal loss, with dynamic weights assigned to balance their contributions. These weights are updated iteratively based on the relative performance of the classifiers, ensuring that classifiers with lower losses (indicating better performance) are assigned higher weights. The total loss for training is obtained by summing the weighted losses across all classifiers. During inference, the dynamic loss adaptive weighted voting strategy combines the predictions from all classifiers, giving more influence to better-performing classifiers. This framework ensures computational efficiency through a shared encoder and simple classifier architecture, while effectively addressing challenging samples to improve domain generalization performance.
Loss function
The loss function of our proposed simplified self-ensemble learning framework consists of only two parts, namely, the cross-entropy loss and the focal loss [58]
. The cross-entropy loss is used for the classifier to update the gradient correctly, and the focal loss is used to force the model to mine more complex classified samples. The core of how they work is that we propose to use only the loss to compute the global weights, letting the model dynamically adjust the appropriate weights each epoch. A dynamic adaptive loss-weighted voting strategy is used during prediction to more reasonably assign weights to each classifier more reasonably according to the change in the loss.
Focal loss. The focal loss was initially proposed to address the class imbalance problem. The class imbalance makes it more difficult to classify a small number of samples than the majority. Class imbalance and difficult classification of complex samples are similar problems. We found that the class imbalance problem and the complex sample problem are common in domain generalization. Therefore, we introduce focal loss to balance the classification preferences of the classifier for different samples so that the model pays more attention to the difficult samples to classify. It can be calculated as follows:
Where denotes a weighting factor,
.
reflects the degree of closeness of the prediction to the ground truth, γ stands for a regularization factor. Dynamic adaptive loss weighting. We can evaluate the performance of the classifier based on the relative size of losses, such as cross-entropy loss or focal loss. The lower the loss value, the better the classification performance, which indicates better generalization performance in the domain generalization problem. Therefore, the following calculation can be obtained:
where stands for the same loss of the specific classifier,
or
. i denotes the number of classifiers.
Total Loss. The total loss consists of an adaptive weighted cross entropy loss and an adaptive weighted focal loss, expressed as follows:
where and
are dynamic adaptive loss weights determined by , respectively.
Dynamic adaptive loss weighted voting strategy
When the loss uses adaptive weighting to assign different weights, we propose to weight the votes using the global properties of that weight. It is worth noting that, contrary to the idea of weighted loss, we need to give more weight to classifiers with smaller loss values [59]. Therefore, the dynamic adaptive loss-weighted voting strategy can be expressed as follows:
where represents the output of the classifier, N stands for the number of classifiers,
denotes the final output of our simplified self-ensemble learning framework.
Experiments
This section mainly describes the details of the experiments performed on the three datasets. In the Dataset section, we first introduce the fundamental characteristics of the three benchmark datasets: OfficeHome, PACS, and VLCS. Second, we address critical details of the experimental design in the Implementation details section. Finally, we describe and analyze the experimental results in the Analysis section.
Dataset
The proposed method is validated on two real-world benchmark datasets and one virtual benchmark dataset, including OfficeHome, VLCS, and PACS. See below for details.
OfficeHome [60] is a real-world benchmark dataset that includes four domains, each of which includes 65 categories related to office supplies, with a total of 15,888 samples.
VLCS [61] is a common natural image benchmark dataset in domain generalization research. It includes four domains, each containing five categories of chairs, cars, birds, dogs, and people, with a total of 10,729 samples.
PACS [62] is another typical unnatural image benchmark dataset consisting of four domains: Sketch, Photo, Art Painting and Cartoon. Each domain includes guitar, house, horse, dog, elephant, giraffe, and person with a total of 9,991 samples.
Baselines
We selected five well-known domain generalization algorithms for a fair comparison with the proposed algorithm to examine the performance of the proposed algorithm. (1) DANN [63] proposes using gradient inversion layers to enhance in-depth features to improve the model’s generalization ability. (2) GroupDRO [64] proposes to use Distributed Robust Optimization (DRO) combined with regularization coupling to improve generalization performance. (3) ANDMask [65] proposes uniform learning consistency and improves generalization by fusing multiple regularization strategies. (4) SagNet [66] performs domain-invariant representation learning through adversarial learning to reduce the image style bias of model learning. (5) VREx [67] proposes variance risk extrapolation to ensure generalization outside the distribution.
Implementation details
We follow the DomainBed protocol and use the same experimental setup to ensure fair comparisons. Specifically, 80% of the source domain serves as the training set, the remainder as the validation set, and the complete target domain as the test set. Model selection is based on validation accuracy in the source domain, with the highest corresponding test accuracy taken as the final result. The dataset is split using ten random seeds, and the best result among these is reported. Hyperparameters are set as follows: 30 epochs, learning rate 1e-2, batch size 32, and weight decay 5e-4 for all datasets.
Results
All experimental results described in this section are obtained from local personal PC training. We present the experimental results on the three datasets separately and give a complete analysis.
Results on OfficeHome. Our experimental results on the OfficeHome dataset are summarized in Table 1. OfficeHome contains 65 subdomain categories, making generalization more challenging due to its diversity. Here, A denotes the target domain Art, C denotes Clipart, P denotes Product, and R denotes Real World. The training domain excludes the target domain. For A, there are 13,161 training samples and 2,427 testing samples; for C, 11,223 training samples and 4,365 testing samples; for P, 11,149 training samples and 4,439 testing samples; and for R, 11,231 training samples and 4,357 testing samples. Our method, leveraging a simplified self-ensemble learning approach and weighted voting strategy, achieves superior performance across all four domains. In Clipart, the most challenging domain, our algorithm’s accuracy surpasses the second-best (DANN) by 0 . 89%. On average, it outperforms the second-best (VREx) by 0 . 66%, demonstrating clear advantages over recent state-of-the-art methods.
Results on PACS. Our experimental results on the PACS dataset are shown in Table 2. The PACS dataset contains many virtual images and fewer natural images, making it suitable for verifying generalization performance on synthetic images. Here P represents the target domain of Photo, A represents Art Painting, C represents Cartoon, and S represents Sketch. The training domain consists of all other domains excluding the target domain. When P is the target domain, there are 8,321 training samples and 1,670 testing samples. When A is the target domain, there are 7,943 training samples and 2,048 testing samples. When C is the target domain, there are 7,647 training samples and 2,344 testing samples. In contrast, when S is the target domain, there are 6,062 training samples and 3,929 test samples. Our method shows a generalization ability far ahead of other methods in the Photo and Art-Painting domains with the most significant domain discrepancy. When Photo is the target domain, the prediction accuracy of our method was 2 . 15% higher than the second place (SagNet). When Art-Painting was the target domain, the prediction accuracy of our method is 0 . 05% higher than the second place (SagNet). These results indicate that our method has a stronger generalization ability.
Results on VLCS. The VLCS dataset consists of natural images, resulting in a smaller domain gap compared to other datasets. Where V denotes the target domain VOC2007, L denotes LabelMe, C denotes Caltech101, and S denotes SUN09. The training domain consists of all other domains in the dataset, excluding the target domain. When V is the target domain, there are 7,353 training samples and 3,376 testing samples. When L is the target domain, there are 8,073 training samples and 2,656 testing samples. When C is used as the target domain, there are 9,314 training samples and 1,415 testing samples. In contrast, when S is used as the target domain, there are 7,447 training samples and 3,282 testing samples. As seen from Table 3, our method outperforms other methods and achieves the best performance in all four domains(VOC2007, LabelMe, Caltech101, and SUN09). Therefore, our method also has good performance with a slight domain gap. As demonstrated in Table 4, the simplified self-learning model that integrates a dynamic adaptive loss-weighted voting strategy continues to exhibit superior performance in cross-dataset scenarios.
Analysis
In this section, we further analyze and discuss the significant effect of the simplified self-ensemble learning framework and weighted voting strategy to enhance the model’s generalization ability without increasing the number of models. Notably, we reveal the impact of the proposed algorithm by studying the compositional changes in the loss function and learning strategy of the proposed algorithm on different datasets. Table 5, Table 6, and Table 7 present the results of ablation experiments conducted with different numbers of classifiers across three benchmark datasets. Additionally, Table 8, Table 9, and Table 10 showcase the results of the combinatorial ablation experiments performed on the same datasets. Table 11, Table 12, and Table 13 present the results of the ablation experiments comparing the adaptive dynamically-weighted loss-voting strategy and the fixed-proportional-voting strategy across three benchmark datasets. Where denotes the cross-entropy loss,
denotes the Focal loss, and
denotes adaptive dynamically weighted loss-voting strategy.
Main findings
From the above table, we can draw the following conclusions:
(1) Experiments on three benchmark datasets reveal that the model performs best when N = 2 classifiers are used. Using fewer high-quality classifiers improves generalization and accuracy by reducing noise and complexity. For example, experimental results on OfficeHome show that average accuracy is 62 . 01% at N = 2, dropping to 56 . 40% at N = 3, 55 . 41% at N = 4, and 53 . 17% at N = 5. While increasing the number of classifiers may reduce overfitting on some datasets, adding low-quality or redundant classifiers often introduces noise, degrading performance. On PACS, average accuracy peaks at 81 . 45% (N = 2) but decreases to 78 . 14% (N = 5). Similarly, in weighted voting, low-quality voters overshadowing high-quality ones can reduce decision accuracy. On VLCS, this leads to a decline in average accuracy from 75 . 31% (N = 2) to 72 . 88% (N = 5). These results highlight that two high-quality classifiers are sufficient to achieve stable and accurate results while avoiding unnecessary complexity and noise. Increasing the number of classifiers does not necessarily enhance performance; instead, it may introduce noise and lead to biased judgments in weighted voting. This is particularly evident when low-quality voters overshadow the correct assessments of high-quality ones, ultimately reducing decision-making accuracy.
(2) With only and
included, the average precision on OfficeHome improved by 1 . 64% compared to when only
was included. The average precision on PACS has improved by 0 . 70% compared to the PACS dataset that utilized only
. The average accuracy on VLCS is 1 . 97% higher when only
is included. The experimental results indicate that the generalization ability of the simplified self-integrated learning framework, which consists of a single encoder and dual classifiers, surpasses that of the original encoder. Moreover, the simplified self-integrated learning framework, which does not incorporate the Weighted Voting strategy, has outperformed several representative algorithms. For instance, our proposed algorithm achieves an average accuracy of 61 . 14% on OfficeHome, surpassing DANN (59 . 05%), ANDMask (56 . 22%), and GroupDRO (58 . 09%). The experimental results presented above demonstrate that the Focal Loss introduced by the second classifier is more effective in focusing on the complex samples within the training domain. These complex samples are a crucial factor in differentiating subclasses across various target domains, posing a challenge to the model’s generalization ability. The proposed simplified self-integrated learning framework demonstrates significant robustness and generalization capabilities.
(3) With the inclusion of only and
, the average accuracy on PACS improves by 0 . 10% compared to using only the
loss function, while the average accuracy on VLCS improves by 1 . 00% when using only the
loss function. The average accuracy on OfficeHome is 0 . 09% lower than when only
is included. The experimental results indicate that when the same cross-entropy loss is applied to multiple classifiers and the Weighted Voting strategy is implemented, the model’s generalization performance is slightly enhanced when the number of subdomains is small. When the number of subdomains in the training domain is substantial (e.g., 65 subdomains in OfficeHome), enhancing classification accuracy and generalization becomes challenging, even with the use of multiple classifiers.
(4) Incorporating ,
, and
significantly improves generalization performance across all scenarios (OfficeHome, PACS, VLCS). Results show that the proposed dynamic loss-weighted voting strategy enhances generalization without modifying model structure or count.
(5) By setting a fixed proportion of weighted loss voting strategy between the dual classifiers ablation experiments demonstrate the advantages of adaptive dynamic weighted loss voting strategy, i.e., a fixed proportion of weighted loss voting set by human beings tends to be less effective than the weights derived from the model’s own learning. This is because artificially set hyperparameters cannot accurately guide the model in achieving generalization across different datasets.
In addition, the t-SNE algorithm is employed to downscale the output of the model’s final layer, resulting in the visualization presented in Fig 2. This figure demonstrates that SSEL effectively differentiates the classification boundaries between various target sub-domains, improving the confusion observed at the boundaries of the baseline algorithm. In conclusion, the results of the aforementioned ablation studies validate the model proposed in this paper. The algorithm code and dataset utilized in this study are publicly available on GitHub: https://github.com/Marzsccc.
The left panel presents the baseline results obtained without employing any dynamic graph (DG) algorithm, whereas the right panel demonstrates the performance of our proposed SSEL framework. Notably, the classification boundaries and inter-class distances depicted in the right panel exhibit significant enhancement relative to the baseline left panel. This comparative analysis was conducted using the PACS dataset.
Limitations and future recommendations
This study has several limitations. For instance, due to hardware constraints, deeper encoder architectures that could potentially improve feature extraction were not explored, which might have limited the maximum achievable performance of the model. Furthermore, the research primarily centered on the simplified self-ensemble learning framework, leaving other complementary approaches, such as advanced data augmentation techniques and robust domain-invariant representation learning, insufficiently investigated.
In future work, we aim to address these limitations by exploring more comprehensive solutions. Specifically, we plan to investigate lightweight yet deeper encoder architectures, to integrate advanced data augmentation strategies, and to develop hybrid models that effectively combine ensemble learning with domain-specific adaptations. Additionally, we intend to evaluate the proposed framework on diverse and larger-scale datasets to ensure its broader applicability and robustness.
Conclusion
In this paper, we propose a simplified self-ensemble learning framework and a novel adaptive weighted voting strategy with dynamic losses. The proposed framework achieves superior ensemble performance without requiring additional model copies or saving iterative parameters, indicating that a shared encoder does not compromise the generalization capability of classifiers. The dynamic loss adaptive weighted voting strategy enables different classifiers to dynamically assign the corresponding voting weights according to the loss in the training period, allowing the better-performing classifiers to gain more discourse power. As the performance of other classifiers is enhanced, the weights will still be changed relatively. In addition, by introducing focal loss, the model focuses on mining the information of complex samples and only uses the loss value to learn hyperparameters, avoiding the influence of manual parameter adjustment. Experiments on three datasets (OfficeHome, PACS, VLCS) show that our proposed algorithm achieves more robust generalization performance compared to existing methods. Experiments conducted on three benchmark datasets (OfficeHome, PACS, and VLCS) demonstrate that the proposed algorithm achieves robust generalization performance compared to state-of-the-art methods. Specifically, our method addresses two critical challenges: (1) Conventional ensemble learning methods require multiple model copies and parameter averaging, which increases computational resource demands. In contrast, our framework trains a single model, significantly reducing computational costs while maintaining generalization performance using homogeneous classifiers. (2) Existing voting strategies rarely consider dynamic weight allocation among classifiers with varying performance. Our dynamic loss-based adaptive voting strategy assigns weights adaptively based on loss variations during training. Combined with focal loss, this strategy effectively enhances the ensemble’s generalization performance. While the proposed framework demonstrates promising results, there is room for further exploration to enhance its capabilities and applicability. Future work could focus on incorporating deeper and more efficient encoder architectures to improve feature extraction and scalability to larger datasets. Additionally, advanced data augmentation strategies could be employed to diversify training data and better address domain gaps. Another valuable direction would be to integrate the framework with complementary approaches, such as meta-learning and domain-invariant representation learning, to create hybrid models that capitalize on the strengths of multiple methodologies. Furthermore, evaluating the framework in more diverse and challenging real-world scenarios, such as autonomous driving and medical image analysis, would help establish its robustness and practical value across various domains.
References
- 1. Stacke K, Eilertsen G, Unger J, Lundstrom C. Measuring domain shift for deep learning in histopathology. IEEE J Biomed Health Inform 2021;25(2):325–36. pmid:33085623
- 2.
Luo Y, Zheng L, Guan T, et al. Taking a closer look at domain shift: Category-level adversaries for semantics consistent domain adaptation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2019.
- 3. Sanchez J, Deschaud JE, Goulette F. Domain generalization of 3d semantic segmentation in autonomous driving. Proceedings of the IEEE/CVF International Conference on Computer Vision; 2023.
- 4. Khosravian A, Amirkhani A, Masih‐Tehrani M, Yazdanijoo A. Multi‐domain autonomous driving dataset: Towards enhancing the generalization of the convolutional neural networks in new environments. IET Image Process 2022;17(4):1253–66.
- 5. Hu S, Fang Z, Deng Y, et al. Toward full-scene domain generalization in multi-agent collaborative bird’s eye view segmentation for connected and autonomous driving. IEEE Transactions on Intelligent Transportation Systems; 2024.
- 6. Yoon J, Oh K, Shin Y. Domain generalization for medical image analysis: A review. Proc IEEE. 2024.
- 7. Zhang R, Xu Q, Huang C, Zhang Y, Wang Y. Semi-Supervised Domain Generalization for Medical Image Analysis. 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI); 2022. p. 1–5.
- 8. Yan S, Yu Z, Liu C, et al. Prompt-driven latent domain generalization for medical image classification. IEEE Trans Med Imag. 2024.
- 9. Ahmad Qureshi S, Hussain L, Rafique M, Sohail H, Aman H, Rahat Abbas S, et al. EML-PSP: A novel ensemble machine learning-based physical security paradigm using cross-domain ultra-fused feature extraction with hybrid data augmentation scheme. Expert Syst Appl. 2024;243:122863.
- 10. Zhu W, Mousavi SM, Beroza GC. Seismic signal augmentation to improve generalization of deep neural networks. Adv Geophys. 2020:151–77.
- 11. Chen J, Gao Z, Wu X, Luo J. Meta-causal learning for single domain generalization. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2023. p. 7683–92.
- 12. Zhang Z, Wang B, Jha D, et al. Domain generalization with correlated style uncertainty. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision; 2024. p. 2000–9.
- 13. Zhou K, Liu Z, Qiao Y. Domain generalization: A survey. IEEE Trans Pattern Anal Mach Intell. 2022;44(12):12345–67.
- 14. Wang J, Lan C, Liu C. Generalizing to unseen domains: A survey on domain generalization. IEEE Trans Knowl Data Eng. 2022;35(8):8052–72.
- 15. Yu H, Zhang X, Xu R, Liu J, He Y, Cui P. Rethinking the evaluation protocol of domain generalization. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2024. p. 21897–908.
- 16. Jiang L, Wu J, Zhao S, et al. Domain-invariant feature learning with label information integration for cross-domain classification. Neural Comput Appl. 2024:1–20.
- 17. Zou Y, Luo C, Zhang J. DIFLD: Domain invariant feature learning to detect low-quality compressed face forgery images. Complex Intell Syst 2023;10(1):357–68.
- 18. Xie Y, Shi J, Gao C, Yang G, Zhao Z, Guan G, et al. Rolling bearing fault diagnosis method based on dual invariant feature domain generalization. IEEE Trans Instrum Meas. 2024;73:1–11.
- 19.
Zhang X, Cui P, Xu R, et al. Deep stable learning for out-of-distribution generalization. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2021.
- 20. Shen Z, Cui P, Zhang T, Kunag K. Stable learning via sample reweighting. AAAI 2020;34(04):5692–9.
- 21. Yu H, Cui P, He Y, Shen Z, Lin Y, Xu R, et al. Stable learning via sparse variable independence. AAAI 2023;37(9):10998–1006.
- 22.
Polikar R. Ensemble learning. Ensemble Machine Learning. Springer; 2012.
- 23. Yang Y, Lv H, Chen N. A survey on ensemble learning under the era of deep learning. Artif Intell Rev 2022;56(6):5545–89.
- 24. Zhou K, Yang Y, Qiao Y, Xiang T. Domain adaptive ensemble learning. IEEE Trans Image Process. 2021;30:8008–18. pmid:34534081
- 25. Mienye ID, Sun Y. A survey of ensemble learning: Concepts, algorithms, applications, and prospects. IEEE Access. 2022;10:99129–49.
- 26. Dong X, Yu Z, Cao W, Shi Y, Ma Q. A survey on ensemble learning. Front Comput Sci 2019;14(2):241–58.
- 27. Sagi O, Rokach L. Ensemble learning: A survey. Wiley Interdiscip Rev: Data Mining Knowl Discov. 2018:8(4);e1249.
- 28. Ang EPW, Lin S, Kot AC. Diverse deep feature ensemble learning for Omni-domain generalized person re-identification. Proceedings of the 2024 9th International Conference on Multimedia and Image Processing; 2024:64–71
- 29. Li X, Wang W, Wu L. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. Adv Neural Inform Process Syst. 2020;33:21002–12.
- 30. Nguyen A, Tran T, Gal Y. Domain invariant representation learning with domain density transformations. Adv Neural Inform Process Syst. 2021;34:5264–75.
- 31. Hu W, Liu B, Zeng H. Adversarial decoupling and modality-invariant representation learning for visible-infrared person re-identification. IEEE Trans Circuits Syst Video Technol. 2022;32(5):1234–45.
- 32.
Hou Z, Yu B, Tao D. Batchformer: Learning to explore sample relationships for robust representation learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2022.
- 33. Lee S, Jeon H, Kang U. Multi-EPL: Accurate multi-source domain adaptation. PLoS One 2021;16(8):e0255754. pmid:34352030
- 34.
Mancini M, Akata Z, Ricci E, et al. Towards recognizing unseen categories in unseen domains. European Conference on Computer Vision. Springer; 2020.
- 35.
Li P, Li D, Li W, et al. A simple feature augmentation for domain generalization. Proceedings of the IEEE/CVF International Conference on Computer Vision; 2021.
- 36.
Xu Q, Zhang R, Zhang Y. A Fourier-based framework for domain generalization. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2021.
- 37. Jiang W, Zhang K, Wang N, Yu M. MeshCut data augmentation for deep learning in computer vision. PLoS One 2020;15(12):e0243613. pmid:33362231
- 38.
Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. International Conference on Machine Learning; 2017.
- 39. Li D, Yang Y, Song YZ, et al. Learning to generalize: Meta-learning for domain generalization. Proceedings of the AAAI Conference on Artificial Intelligence; 2018.
- 40.
Shu Y, Cao Z, Wang C. Open domain generalization with domain-augmented meta-learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2021.
- 41. Zhao Y, Zhong Z, Yang F, et al. Learning to generalize unseen domains via memory-based multi-source meta-learning for person re-identification. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2021. p. 6277–86.
- 42. Deng T, Xue C, Zhang G. Data modeling analysis of GFRP tubular filled concrete column based on small sample deep meta learning method. PLoS One 2024;19(7):e0305038. pmid:38985781
- 43. Dolan MJ, Ore A. Metalearning and data augmentation for mass-generalized jet taggers. Phys Rev D. 2022;105(9).
- 44. Zhang M, Huang S, Wang D. Domain generalized few-shot image classification via meta regularization network. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2022. p. 3748–52.
- 45. Shui C, Wang B, Gagné C. On the benefits of representation regularization in invariance based domain generalization. Mach Learn 2022;111(3):895–915. pmid:35510180
- 46. Tian C, Li H, Xie X. Neuron coverage-guided domain generalization. IEEE Trans Pattern Anal Mach Intell. 2022.
- 47. Wang H, Bai X, Wang S, Tan J, Liu C. Generalization on unseen domains via model-agnostic learning for intelligent fault diagnosis. IEEE Trans Instrum Meas. 2022;71:1–11.
- 48. Zhu Y, Wu X, Qiang J, Hu X, Zhang Y, Li P. Representation learning with deep sparse auto-encoder for multi-task learning. Pattern Recogn. 2022;129:108742.
- 49.
He K, Zhang X, Ren S. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016.
- 50. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Commun ACM 2017;60(6):84–90.
- 51. Li W, Xu Z, Xu D, Dai D, Van Gool L. Domain generalization and adaptation using low rank exemplar SVMs. IEEE Trans Pattern Anal Mach Intell 2018;40(5):1114–27. pmid:28534767
- 52.
Malisiewicz T, Gupta A, Efros AA. Ensemble of exemplar-svms for object detection and beyond. Proceedings of the 2011 International Conference on Computer Vision; 2011.
- 53.
Xu Z, Li W, Niu L. Exploiting low-rank structure from latent domains for domain generalization. European Conference on Computer Vision; 2014.
- 54.
Fan X, Wang Q, Ke J. Adversarially adaptive normalization for single domain generalization. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2021.
- 55. Segu M, Tonioni A, Tombari F. Batch normalization embeddings for deep domain generalization. Pattern Recogn. 2022:109115.
- 56. Zhuang Z, Wei L, Xie L, Ai H, Tian Q. Camera-based batch normalization: An effective distribution alignment method for person re-identification. IEEE Trans Circuits Syst Video Technol 2022;32(1):374–87.
- 57.
Kim D, Yoo Y, Park S. Selfreg: Self-supervised contrastive regularization for domain generalization. Proceedings of the IEEE/CVF International Conference on Computer Vision; 2021.
- 58.
Lin T, Goyal P, Girshick R. Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision; 2017.
- 59. Kuncheva LI, Rodríguez JJ. A weighted voting framework for classifiers ensembles. Knowl Inf Syst 2012;38(2):259–75.
- 60.
Venkateswara H, Eusebio J, Chakraborty S, et al. Deep hashing network for unsupervised domain adaptation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017.
- 61. Fang C, Xu Y, Rockmore DN. Unbiased metric learning: On the utilization of multiple datasets and web images for softening bias. 2013 IEEE International Conference on Computer Vision; 2013. p. 1657–64.
- 62.
Li D, Yang Y, Song YZ. Deeper, broader and artier domain generalization. Proceedings of the IEEE International Conference on Computer Vision; 2017.
- 63. Ganin Y, Ustinova E, Ajakan H. Domain-adversarial training of neural networks. J Mach Learn Res. 2016;17(1):2096–30.
- 64. Sagawa S, Koh P, Hashimoto T, et al. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. arXiv preprint. 2019.
- 65. Parascandolo G, Neitz A, Orvieto A. Learning explanations that are hard to vary. arXiv preprint. 2020.
- 66.
Nam H, Lee H, Park J, et al. Reducing domain gap by reducing style bias. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2021.
- 67.
Krueger D, Caballero E, Jacobsen JH, et al. Out-of-distribution generalization via risk extrapolation (rex). Proceedings of the International Conference on Machine Learning; 2021.