Figures
Abstract
In the field of computer vision, the task of image annotation and classification has attracted much attention due to its wide demand in applications such as medical image analysis, intelligent surveillance, and image retrieval. However, existing methods have significant limitations in dealing with unknown target domain data, which are manifested in the problems of reduced classification accuracy and insufficient generalization ability. To this end, the study proposes an adaptive image annotation classification model for open-set domains based on dynamic threshold control and subdomain alignment strategy to address the impact of the difference between the source and target domain distributions on the classification performance. The model combines the channel attention mechanism to dynamically extract important features, optimizes the cross-domain feature alignment effect using dynamic weight adjustment and subdomain alignment strategy, and balances the classification performance of known and unknown categories by dynamic threshold control. The experiments are conducted on ImageNet and COCO datasets, and the results show that the proposed model has a classification accuracy of up to 93.5% in the unknown target domain and 89.6% in the known target domain, which is better than the best results of existing methods. Meanwhile, the model check accuracy and recall rate reach up to 89.6% and 90.7%, respectively, and the classification time is only 1.2 seconds, which significantly improves the classification accuracy and efficiency. It is shown that the method can effectively improve the robustness and generalization ability of the image annotation and classification task in open-set scenarios, and provides a new idea for solving the domain adaptation problem in real scenarios.
Citation: Li S, Chang Z, Liu H (2025) Application of open domain adaptive models in image annotation and classification. PLoS One 20(5): e0322836. https://doi.org/10.1371/journal.pone.0322836
Editor: Shahul Hameed K A, Sethu Institute of Technology, INDIA
Received: September 3, 2024; Accepted: March 28, 2025; Published: May 14, 2025
Copyright: © 2025 Li et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting Information files.
Funding: The research is supported by National Social Science Foundation of China in 2022: Research on Evaluation System and Guarantee Mechanism of Labor Rights and Interests of Flexible Employees in Platform Enterprises (No.22XJY004). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Abbreviations: DA, Domain Adaptation; ODA, Open Domain Adaptation; ODA-DTC, Dynamic Threshold Control - Open Domain Adaptation; P, Precision; R, Recall; AP, Average-Precision; DANN, Domain Adversarial Neural Networks; CADA, Conditional Adversarial Domain Adaptation; SEDA, Self-ensembling for Visual Domain Adaptation; L2T, Transfer Learning via Learning to Transfer; JAN, Joint Adaptation Networks; ADDA, Adversarial Discriminative Domain Adaptation.
Introduction
In the digital age, the explosive growth of image data poses unprecedented challenges and opportunities for computer vision [1,2]. Image Annotation and Classification (IAC) has always been a key focus in the fields of industry and vision, and is widely used in intelligent auxiliary processes such as intelligent driving, image retrieval, and healthcare. Its importance is self-evident [3]. The advancement of technology and the diversification of application scenarios have put forward higher requirements for the accuracy and generalization ability of IAC algorithms. The traditional IAC methods rely on manually designed features and shallow models, which have achieved some success in the early stages. However, due to the limitations of manual features and the simplicity of models, these traditional methods often perform poorly in complex practical application scenarios. Especially when facing large-scale datasets and inter-domain differences, its performance is limited by the ability to express features and generalization. With the growth of deep learning, image processing methods utilizing deep neural networks have begun to emerge, which can automatically learn complex feature representations and greatly perfect the performance of image processing tasks [4,5]. J. Li et al. designed a deep label specific feature learning model by combining deep convolutional networks and label embedding to enhance the alignment effect of specific labels in image classification [6]. This model could capture the dependency between image labels, improving the effectiveness of image classification. N. A. Koohbanani et al. proposed a Self-Path Model (SPM) combining self-supervised Convolutional Neural Networks (CNN) to obtain detailed information from high-resolution pathological images. When there was little or no available labeled data in the Target Domain (TD), SPM could improve the domain adaptability of tissue pathology image classification to achieve the goal of image depth detection [7]. S. Zhang et al. proposed a novel semantic fully supervised model by combining self-supervision and semantic features to improve the accuracy of existing 3D medical imaging tasks [8]. This model could effectively accelerate convergence and improve the accuracy of various 3D medical imaging tasks such as classification, segmentation, and detection. To further optimize the classification results of multi-modal image and text data, N. Xu et al. proposed a cross-modal image and text data classifier after combining sentiment analysis. The effectiveness of this classifier in improving the performance of multi-modal classification tasks was superior to traditional models [9]. Although the above research results can make technological improvements in their respective fields, even deep learning models may encounter insufficient generalization ability when applied to new and unprecedented domain data. In addition, most existing image processing methods focus on closed set conditions. In many real-world scenarios, there are significant differences between the distribution of training data and testing data, namely the Source Domain (SD) and TD, due to differences in collection conditions, heterogeneity of devices, or differences in data annotation methods. This leads to Domain Adaptation (DA) issues. Especially under open-set conditions, where the TD has new categories that have not been seen in the SD, traditional DA methods face significant challenges. Regarding this issue, domestic and foreign research has also explored it one after another. G. Chen et al. designed an open-set recognition accidental estimation model to improve the empirical classification performance of known data with open-set labeling. The proposed method was significantly superior to other existing methods and achieved advanced classification performance [10]. To improve the recognition performance of pneumonia X-ray images in the open domain of medical images, Z. P. Jiang et al. constructed an improved VGG16 pneumonia image classification model. This model has produced superior test results compared to the current best practice CNN for medical image recognition [11]. A. Maracani et al. found that the open-set data collected by underwater plankton sensors in situ often has serious imbalances. Therefore, they designed a new transfer learning pipeline for plankton image classification by combining transfer learning. The average efficiency of open-set image classification transfer learning under this method was about 6% higher than other methods [12]. To improve the resolution effectiveness of ultrasound images of Papillary Thyroid Carcinoma (PTC), X. Ai et al. introduced an open dataset for capsule network training, and finally proposed an ultrasound image classification and diagnosis model [13]. The accuracy of PTC feature classification under this model was 81.06%, far exceeding other traditional methods. Parmar J et al. In order to enhance the effectiveness of open machine learning techniques in open set adaptive domain image classification, the research Confucians assembled natural language processing techniques and proposed a novel image classification method. The experimental results show that the accuracy of open set adaptive image classification under this method is higher than the traditional method and the image classification is more adaptive [14].Chen H et al. proposed a new method for stable classification of hyperspectral images, which is based on superpixel principal component analysis and random patch network. Not only can the data-driven approach be utilized, but it can also be applied to efficiently take into account more global and local spectral knowledge at the hyperpixel level. Test results on several open-set domain image classification datasets show that the method significantly outperforms several current state-of-the-art methods [15].
In summary, in recent years, image annotation and classification techniques play an important role in the field of computer vision, and are widely used in scenarios such as medical image analysis, intelligent surveillance and image retrieval. However, traditional supervised learning methods are highly dependent on the distribution of the training data, and when faced with the difference in data distribution between the source and target domains, especially in open-domain scenarios, the target domain data may contain new categories that have not been seen in the source domain, which results in insufficient generalization ability of the model and a significant decrease in classification accuracy. These limitations indicate that existing techniques are significantly inadequate for open-domain image annotation and classification tasks. In order to address the above issues, the scope of the research focuses on open-domain image annotation and classification tasks, specifically including the effective alignment of source and target domain features, the efficient utilization of unlabeled data in the target domain, the impact of dynamic threshold adjustment on classification performance, and the refinement of feature distributions in complex scenes. An optimized open-domain adaptive image annotation classification model (ODA-IAC) is proposed, whose main highlights include: the introduction of a domain adaptive algorithm (DA) to reduce the differences in feature distributions between the source and target domains; the combination of a dynamic threshold control module (DTC), which improves the model’s classification accuracy of unknown categories by real-time adjusting of thresholds to adapt to the changes in the distribution of samples; and the adoption of a channel attention mechanism (CAM) to enhance the expression ability of important features; and adding the subdomain alignment strategy (SAS) to further refine the feature distribution within the target domain and improve the model’s adaptability to complex scenes. The novelty of the research lies in the integrated use of dynamic threshold control, channel attention mechanism and subdomain alignment strategy, which solves the performance bottleneck of traditional domain adaptive algorithms in open-domain scenarios through the organic combination of modules. The expected contribution of the research provides an efficient method for processing images in unknown target domains, as well as the potential for wide application in practical applications.
1. Methods and materials
This study focuses on the domain differences in image classification, and proposes a dynamic threshold control DTC-ODA model by combining CAM and ODA algorithms. Secondly, for the problem of dubious classification results caused by the constant threshold value of the DTC-ODA model, the study continues to introduce the DWA and SAS methods for optimization, and finally proposes an optimized open-domain adaptive image annotation classification ODA-IAC model.
1.1 Construction of the ODA-IAC model
Traditional image annotation models typically use supervised learning to train on annotated datasets. However, due to domain differences between different datasets, these models often have poor generalization ability on new datasets [16]. Therefore, this study attempts to use existing annotation knowledge to label unknown data. DA algorithm is a type of machine learning algorithm aimed at solving the problem of mismatched data distribution between the SD and TD. The main reason for choosing the domain adaptive algorithm is that the significant difference in the data distribution between the source and target domains leads to the poor performance of traditional supervised learning methods in the target domain, especially in the open-domain scenarios, where the target domain contains new categories that have not been seen in the source domain, which makes the generalization ability of the model challenging. The domain adaptive algorithm solves the problem of inconsistent distribution of training and test data by aligning the feature distributions of the source and target domains, thus improving the classification accuracy of the model on the target domain [17-19]. Fig 1 is a schematic diagram of ODA.
In Fig 1, ODA can allow for the existence of unknown domains in the SD, and through appropriate strategies, it can solve the generalization problem of unknown TDs. However, general ODA may be limited by the quality and quantity of unlabeled TD data, which can affect the performance of the model. In open-domain scenarios, target-domain data usually contain unknown categories whose distribution is significantly different from that of the source domain. Fixed threshold models are difficult to balance the classification performance of known and unknown categories, which may lead to misclassification of unknown category data DTC effectively improves the model’s ability to recognize unknown categories by dynamically adjusting the threshold according to the sample importance. Therefore, this study proposes a ODA-DTC model, whose structure is Fig 2.
In Fig 2, DTC-ODA can be divided into four major modules, i.e., domain module, channel attention module, labeling module and adaptive module. The domain module is used to input source and target domain data for feature generation and distribution alignment; the channel attention module improves the expression of key features through weight addition; the annotation module realizes the accurate annotation of known and unknown categories through the dynamic thresholding mechanism; and the adaptive module dynamically adjusts the inter-domain differences in combination with the sub-domain alignment strategy to enhance the feature alignment effect of the target domain data. In terms of process, firstly, a convolutional neural network is utilized to extract multi-scale image features from low to high levels to provide rich basic information for the subsequent modules. Secondly, the channel attention mechanism is introduced to enhance the expression of important features through adaptive pooling and weighting operations to avoid the interference of redundant information on feature extraction. In addition, the dynamic threshold control module adjusts the feature extraction weights in real time according to the importance of the image samples, so as to optimize the expression ability of the features in diverse scenes. Fig 3 shows the CA structure.
In Fig 3, the whole module contains the original image layer (L1), pooling layer (L2), convolutional layer (L3), Sigmoid function layer (L4), channel weights layer (L5) and feature layer (L6). The original feature map is first input and processed through Adaptive Pooling (ATP), Average Pooling (AVP), and Maximum Pooling (MP), respectively. ATP uses 2D convolution for channel information learning. AVP and MP enhance the attention of local and global features. Then, the weights of each channel are calculated through 1D convolution and Sigmoid function, and finally, after weighting, the feature map is output [20]. It can be seen that CAM is able to weight different channels according to the importance of features, effectively enhancing the focus on key regions. Compared with the traditional fully connected layer or convolutional operation, this mechanism has stronger feature expression capability, especially superior in dealing with complex cross-domain data distribution. The calculation formula for channel weights is equation (1).
In equation (1), is the Sigmoid function.
,
and
are AVP, MP, and ATP.
represents the original image, and its weighting calculation is equation (2).
In equation (2), represents the weighted original feature image. From the above calculation formula, compared to general attention modules, the CAM used in this study removes the fully connected layer and uses 1D convolution to effectively preserve the information exchange between channels while reducing the dimensionality of complex features. In addition, to improve the distribution similarity between the SD and the TD, this study first trains the model using labeled samples in the SD to learn image features in the SD. During this process, the loss function of the classifier is equation (3).
In equation (3), and
respectively denote the
-th labeled image in the SD and the corresponding label for that image.
and
represent cross entropy loss and classifier, respectively.
is the number of images in the SD.
is a feature generator. After learning the feature information of label images in the SD, this study introduces a threshold
to train the model. The threshold is determined by the
-th known label image and
unknown images in an adversarial game. If the discrimination probability of the unknown image is biased towards
, it indicates that the classifier has smaller distribution errors in the SD and TD, and the success rate of recognition will be higher. Specifically, in the open-domain scenario, the source and target domain data distributions may have large differences, especially the rare samples in the target domain are more likely to be neglected. Dynamic adjustment of weights By dynamically adjusting the training weights, the model is able to learn the rare sample features more efficiently, thus balancing the classification performance of known and unknown categories and improving the robustness of the model when dealing with unbalanced datasets. The calculation of error is equation (4).
In equation (4), represents the number of unlabeled images in the TD.
is the probability that the
-th non-image in the TD belongs to the
-th image category. To improve the alignment effect of image data between two domains, this study draws on the central idea of generative adversarial networks [21]. By generating adversarial samples, the model is exposed to more diverse inputs during training, thus enhancing its generalization ability. This enhancement method not only improves the model’s performance on known image classification, but also significantly improves its ability to handle unknown images. Through adversarial training, the generator generates realistic data samples, while the discriminator accurately distinguishes between real data and generated data as much as possible. This study assumes that the feature extraction network is a generator, with the SD being the real sample and the TD being the generated sample. By adversarial iteration between the final output sample of the generator and the discriminator, adaptive alignment between the two domains is ultimately achieved. At the same time, this study introduces a threshold
as a constraint, defining that if the probability of the predicted sample image in the TD is greater than
, it is marked as a known label image. The constraint process is equation (5).
In equation (5), is the image in the TD.
is the probability of the predicted sample image in the TD.
and
are known and unknown images in the TD, respectively. In summary, this study proposes the operational process of the DTC-ODA model by combining the above types of modules and DTC, as shown in Fig 4.
Fig 4 illustrates the operational flow of the DTC-ODA model. The model contains 9 main links, i.e., source and target domain determination, feature generation, computation of weights, generation of feature maps, dynamic threshold judgment, feature updating, adversarial training, parameter updating, and image classification. Firstly, the data of source and target domains are fed into the feature generator for feature extraction, followed by the weighting operation of the generated feature map through the channel attention module to enhance the attention of important features. On this basis, the model decides the attributed category of the image according to its feature weights through the dynamic threshold judgment module, thus initially completing the classification of the target domain image. Subsequently, the model further optimizes the classification results through the feature update and adversarial training module to ensure the feature alignment effect between the source domain and the target domain, and performs image classification output at the end. In summary, by sharing the feature extraction layer, the multi-task learning strategy is able to fully utilize the complementary information between different tasks to further improve the overall performance of the model. Especially in unknown image classification, the multi-task learning strategy is able to utilize the boundary information of segmentation tasks to assist the classification decision. Finally, it makes the parameters that satisfy the threshold conditions used for image classification.
1.2 Optimization of the ODA-IAC model
The ODA-DTC image annotation model proposed in the previous section can effectively handle unknown images in the SD and TD and perform alignment processing. However, this threshold control method has certain limitations. For example, if the threshold cannot be flexibly adjusted according to the real-time training needs of the model, it may lead to a decrease in the reliability and validity of the training data results, and may cause negative transfer phenomena [22,23]. In view of this, this study introduces SAS, which divides open-set adaptive domains with a larger range and represents them in sub-domains. At the same time, weight parameters are introduced for DWA [24]. The optimized ODA-IAC model is shown in Fig 5.
In Fig 5, the optimization model can be segmented into 6 modules, namely source DM, target DM, feature extraction, Sub-domain Alignment (SDA), label classifier, and adaptive weights. SDA refers to segmenting a wide range of SD or TD and aligning features in a sub-domain manner to enhance attention to local feature information and improve the accuracy of image classification. The subdomain alignment module was chosen for the study because of its ability to capture feature differences between the source and target domains at a finer granularity, avoiding the problem of category confusion due to global feature averaging in full domain alignment. The schematic diagram of SDA is Fig 6.
Fig 6(a) and 6(b) are schematic diagrams of global alignment and SDA. By comparison, global alignment tends to focus more on the overall features of the image, ignoring the individual differences of different image samples in the domain, making the features of various types of images in the domain prone to confusion. SAS is more capable of displaying individual feature differences in images, mining obvious features for annotation and classification. The calculation formula for SAS is equation (6).
In equation (6), represents the feature extraction function.
and
are the number of samples in the SD and TD.
is difference measurement.
is the regularization parameter that balances these two objectives.
is the SDA loss function. Furthermore, considering that the core of SAS is to enhance the feature expression ability between internal individuals or sub-groups while maintaining overall domain alignment [25]. Therefore, this study optimizes the differences in sub-domain features, and the optimized results are shown in equation (7).
In equation (7), and
are the set of image samples in the
-th sub-domain of the SD and TD.
is the optimization loss function for intra-domain differences. By optimizing the internal differences and optimizing the loss function, a more refined SAS can be achieved, which reduces the feature differences between different sub-domains while increasing the feature differences within the sub-domains to support more accurate image classification and annotation. In addition, the adaptive weight module is divided into two parts: similarity prediction and adaptive weight generation. Through this dynamic adjustment, the similarity reference value of known images and TDIs is improved, thereby avoiding negative transfer risks [26]. The loss function for similarity prediction is equation (8).
In equation (8), is the loss function for similarity prediction.
and
are image samples and their image sample labels in the SD.
is the extracted features of image samples in the SD.
is the total quantity of SDI samples.
means the cross entropy loss function.
is the output result of similarity prediction. The similarity loss function generated by adaptive weights
is equation (9).
In equation (9), represents the known image probability of the top
image samples in the SD in the similarity prediction stage.
is the probability that an image in the TD is recognized as an unknown image of
. If
and
are closer to 1, it indicates that all image data in the SD will be successfully recognized. To improve the accuracy of image similarity comparison between the TD and the SD, this study attempts to pre-train the label classifier module and dynamically control its weight during the training process. The process is equation (10).
In equation (10), and
are the discrimination probabilities of known images in the SD and known images in the TD in similarity prediction, respectively.
is the similarity weight during the training process. The size of this weight value can represent the similarity between the image sample in the TD and the known image. The larger the value, the closer the TDI is to the known image [27,28]. In addition, to meet the characteristics of large sample size and low accuracy in the early stage and small sample size and high accuracy in the later stage of model training, this study dynamically adjusts the threshold
, as shown in equation (11).
In equation (11), is the probability of a known image in the TD. By monitoring the training loss and gradient changes of the model in real time, the parameters of data enhancement are dynamically adjusted so that the enhanced samples are more in line with the learning needs of the current model. Based on the optimization of the above modules, this study proposes an ODA-IAC optimization model, as shown in Fig 7.
In Fig 7, the whole model has 11 segments. That is, determination of SD and TD, feature extraction, sub-domain alignment, calculating channel weights, SD similarity prediction, feature map generation, weight generation, threshold judgment, TD similarity calculation, parameter updating and image classification. First, the SDI data is given, and then the TDI data is input. After feature extraction by a feature extractor, it is separated into sub-domains. Subsequently, alignment operations are performed between sub-domains to transfer image features, making the distribution of domain features of the same type more similar. Finally, the label classifier module is used for label image recognition between the SD and TD, and the similarity between the TDI and the known image is compared using dynamic weights during the process.
2. Results
This study first used classification accuracy as an indicator to determine the optimal values of the three thresholds/weights in the model. At the same time, popular models of the same type were introduced for effectiveness comparison, and the Precision-Recall (PR) curves and areas of each model, as well as accuracy and recall, were tested. In addition, this study conducted visual comparison, confusion matrix comparison, and accuracy comparison between different models through simulation testing to verify the effectiveness and superiority of the research model.
2.1 Performance testing of optimized ODA-IAC models
In order to validate the model performance, the study uses ImageNet and COCO as the open-set dataset, and divides the data into training and test sets, where the training set accounts for 80% and the test set accounts for 20%. The training process was carried out in an Intel Core i7 CPU and NVIDIA GeForce GPU environment with 32 GB of RAM, developed in Python, with a learning rate set to 0.002, a batch size of 64, and 200 training rounds. The model was trained using cross-entropy loss function and Adam optimizer through dynamic threshold adjustment and subdomain alignment optimization. In the testing phase, the model is evaluated with classification accuracy, checking accuracy, recall and F1 value, while comparing the performance of traditional domain adaptive models to ensure the comprehensiveness and reliability of the results. ImageNet and COCO datasets were used as open-set data sources. ImageNet is a large-scale image dataset containing over 14 million images, with hundreds to thousands of images per category. COCO contains over 330,000 images, each with instance annotations for multiple objects. It is a large-scale dataset used for tasks such as image recognition, object detection, and segmentation. In addition, the image specifications used in the study include RGB three-channel images with a resolution of 224 × 224 pixels in JPEG format, derived from ImageNet and COCO datasets. These images were normalized, including resizing and pixel normalization, to ensure adaptation to model input requirements. To determine the optimal values of thresholds and
in threshold dynamic control, this study tested the Known Image Classification Accuracy (KICA) ACC-K and Unknown Image Classification Accuracy (UICA) ACC-UNK as test indicators. ACC-K measured the classification accuracy of the model on images of known categories. ACC-UNK measured the classification accuracy of the model on images of unknown categories. The results are shown in Fig 8.
Fig 8(a) shows the test results of threshold , and Fig 8(b) shows the results of threshold
. In Fig 8(a), as
increased, KICA gradually decreased while UICA continuously improved, indicating that the model’s performance in IAC was gradually improving. When the two types of indicators intersected during this process, it indicated that the model performance tended to be balanced. At this time, the value of
was 0.6, and ACC-K was 67%. In Fig 8(b), when
was 0.7, the classification accuracy in the TD reached its peak first, with an ACC-UNK of 89%. When
was 0.6, the known and unknown IAC performance of the model was more uniform. When
was 0.7, the annotation classification performance of unknown images in the TD was the best. The study went on to test the dynamic weights
under both types of data, which denote the weights that are dynamically adjusted according to the importance of the samples during the training of the model, as shown in Fig 9.
Fig 9(a) and 9(b) show the dynamic weight test results on the ImageNet and COCO datasets. In Fig 9, as the number of iterations of the model increased, the accuracy of model annotation classification gradually improved. In ImageNet and COCO, when iterated 180 and 250 times, the highest classification accuracy of the model was 83% and 88%. In the early stages of training, the number of known image samples for training was relatively small, so the larger the dynamic weight, the less impact the insufficient number of training samples would have. As the performance of the model gradually improved and stabilized, larger dynamic weights would actually suppress the training of the model. A moderate dynamic weight of 0.6 would balance training performance and sample consumption, making the model in the best performance state. This study continued to attempt to validate the performance of each module in the optimized model through ablation testing. Precision (P), Recall (R), and Average Precision (AP) were reference indicators. The AP metric measures the area of the model as a whole under the PR curve, i.e., the classification performance effect. The PR curve and its area with respect to the coordinate axis S were plotted, as shown in Fig 10.
Based on Fig 10(a)–10(d), the PR test curves and areas of the ODA algorithm, ODA-DTC, ODA-DTC-dynamic weight (ODA-DTC-DW), and ODA-DTC-DW-SDA algorithm were presented. The AP curve of the most basic ODA algorithm descended the fastest, with a PR of only 0.747. After adding channel attention, the performance of model image annotation classification was significantly improved, with a PR increase of 0.14. After continuing to introduce DWA and SAS, the model’s recognition ability for unknown images was significantly improved by optimizing the model training and annotation classification process. The maximum PR at this time was 0.953. This study introduced the currently popular ODA-IAC models, such as Domain Adversarial Neural Networks (DANN), Conditional Adversarial Domain Adaptation (CADA), and Self-ensembling for Visual Domain Adaptation (SEDA) [29-31]. To evaluate the performance of the models in cross-domain data migration, the study conducts migration tests between the source and target domains, i.e., training the models on the source domain data and classifying the target domain images, as well as training the models on the target domain data and classifying the source domain images, respectively. Monte Carlo culling is also used to estimate the confidence of the different models to assess their reliability and generalization ability in cross-domain data migration. The confidence score reflects the confidence level of the model in dealing with unknown categories, which enhances the interpretability of the model. The test results are shown in Table 1.
In Table 1, the research proposed model performs well in the performance test of cross-domain data migration, and its check-accuracy, recall, and F1 value are significantly higher than those of the other models, both in the classification tasks of known and unknown domains. Among them, in the known domain, the study of the proposed model achieves 89.6% and 90.5% of check accuracy and recall, respectively, with an F1 value of 90%, an annotation consistency of 0.91, and the shortest running time of only 1.8 seconds. In the unknown domain, i.e., data that are unlabeled or not present at all in the training data of the source domain, the checking accuracy and recall are 89.3% and 90.7%, respectively, with an F1 value of 90.2%, and the annotation consistency is further improved to 0.92, with the shortest running time of only 1.2 seconds. In contrast, the performance of other models such as DANN and CADA is competitive, but they have longer running time and lower annotation consistency, especially in the migration test of unknown domains, with the highest F1 value of only 84.5%. In addition, the SEDA model performs relatively well in annotation consistency, but its F1 value and runtime still do not reach the level of the models mentioned in the study. Meanwhile, the confidence analysis further verifies the stability and reliability of the ODA-IAC model in dealing with unknown categories, and its confidence scores reach 0.91 and 0.93 in the known and unknown domains, respectively, which are significantly higher than those of the other models, indicating that the model is not only able to provide high-precision categorization results, but also make predictions with higher confidence, which is crucial for reducing the risk of misclassification. In contrast, the confidence scores of DANN and CADA are 0.76 and 0.81, respectively, which are significantly less adaptable in the unknown domain, while the confidence score of SEDA reaches 0.87 but is still lower than that of ODA-IAC, suggesting that its handling of cross-domain data migration still has some limitations.
2.2 Simulation testing for the optimized ODA-IAC model
Similarly, to verify the simulation performance of the ODA-IAC, this study selected the VisDA dataset as the data source for simulation testing. The VisDA dataset contains both the source domain (synthetic images) and the target domain (real images), covering 12 widely distributed categories to maximize the diversity and representativeness between the source and target domains [32-34]. Fig 11 shows the example of VisDA’s data sample.
The SDI was colorless and more modeled, while the TDI was more realistic and rounded. This study first introduced visual testing methods to compare image classification models similar to those used in ODA-DTC-DW-SDA, such as Transfer Learning via Learning to Transfer (L2T), Joint Adaptation Networks (JAN), and Adversarial Discriminative Domain Adaptation (ADDA) [35-37]. In order to gain a deeper understanding of how models classify in open-set scenarios, especially how they adapt to the transfer between source and target domains, the study introduces SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations) techniques to visualize the decision boundary for image classification, respectively. With SHAP and LIME, the study is able to accurately track how the model assigns different weights to different features when classifying images in the target domain, and is able to visualize the model’s behavioral patterns when adapting to unknown categories. In particular, during the process of domain transfer, the model optimizes the classification performance by adjusting the decision boundary to ensure an appropriate response to the feature differences between the source and target domains. The test results are shown in Fig 12.
Fig 12(a)–12(d) show the visualization results of image type classification for L2T, JAN, ADDA, and research models. As can be seen in Fig 12, it is found that the classification decision boundary is looser when the methods of L2T and JAN are used, and the feature distributions of some target domain samples and source domain samples are not fully aligned, resulting in more misclassification cases of unknown categories. While the ADDA model has some improvement in domain alignment, its classification boundary still has a local offset, especially on the unseen category, and the feature mapping is more ambiguous. In contrast, the visualization results of the research-proposed ODA-IAC model on the VisDA dataset show that the domain alignment strategy significantly optimizes the classification boundaries, resulting in a tighter aggregation of features in similar categories and a more discrete distribution of features in different categories, which improves the model’s adaptive ability on the target domain. The model prediction was compared with the actual results for a detailed evaluation of classification performance, as shown in Fig 13.
Fig 13(a) and 13(b) show the image confusion matrix results of ADDA and the research model. The ADDA model performed relatively average in image classification performance for 7 SD and TDs. There were 6 of them with a classification accuracy of 80 points, and only 4 with a score above 90 points. There were 7 research models with scores above 80 and 6 models with scores above 90. Although there were still some errors in the research model for recognizing and classifying a small number of different types of images, the overall performance was better than the ADDA model. The reason for this was that the research model adopted the DWA mechanism to adaptively adjust the alignment strategy between the SD and TD, thereby more effectively achieving feature alignment and classifier training between domains, and improving the model’s generalization performance. The study used a larger real dataset as the source of the test data, namely Domain Adaptation Dataset (DomainNet).DomainNet contains images from six different domains, namely clip art, sketches, paintings, real photographs, information icons and sketches. In total, there are about 600,000 images, covering 345 categories, which include animals, daily objects, transportation, etc. The DomainNet is also introduced with a more advanced domain adversarial approach. The more advanced Domain-Adversarial Neural Network (DANN), Conditional Adversarial Domain Adaptation (CADA), and self-integrating for Visual Domain Adaptation (elf-) are also introduced. Ensembling for Visual Domain Adaptation (SEDA). ACC-K and ACC-UNK are used as indicators to test the migration effect of the four models between the source domain image and the target domain image under different lighting conditions, and the test results are shown in Table 2.
In Table 2, under low-light conditions, the classification accuracy of traditional methods such as DANN and CADA decreases due to feature instability, while the proposed model is optimized to optimize the feature alignment in the source and target domains through dynamic threshold adjustment, which enables the model to maintain a high level of classification performance in the target domain, and the ACC-UNK reaches 93.5%. Under normal lighting conditions, the ACC-K and ACC-UNK of the proposed model reach 93.2% and 92.8%, respectively, showing its strong adaptability in typical lighting environments, while under strong lighting conditions, the ACC-K and ACC-UNK still reach 90.9% and 91.5%, respectively, indicating that the dynamic threshold adjustment and sub-domain alignment strategies are able to effectively reduce the impact of the domain changes on the classification decision, making the source domain features more adaptable on the target domain, thus enhancing the generalization ability of the model in different environments. Furthermore, during the feature adaptation process between the source and target domains, the model aligns the data distributions of the source and target domains by introducing DA. Specifically, the features of the source domain are extracted and mapped to the feature space of the target domain through a feature generator and a channel attention mechanism. This process ensures that the features learned in the source domain can be effectively applied in the target domain by minimizing the feature differences between the source and target domains. To further optimize this adaptation process, the study employs SAS, which adapts the source domain features at a finer granularity by dividing the extensive source or target domains into multiple subdomains, allowing the features of the target domain to match the source domains more accurately and reducing the inter-domain distributional differences. Under strong light conditions, the ACC-K and ACC-UNK values of the proposed model reached 90.9% and 91.5%, respectively, which still dominated the performance compared with the other compared models, especially the ACC-UNK value of CADA decreased to 85.2% under strong light, while the proposed model maintained a high stability under strong light conditions. Taken together, the proposed model can better adapt to changes in lighting conditions during cross-domain migration and shows stronger generalization ability and robustness, thanks to the effective combination of dynamic threshold adjustment and subdomain alignment strategy in the model, which results in finer feature extraction and better classification of images in different lighting environments. In addition, in order to adapt to real-time scenarios, the study integrates a parallel computing strategy, which ensures the stability and low latency of the model when processing high throughput data through GPU acceleration and distributed inference techniques.
3. Discussion and conclusion
3.1 Discussion
IAC is an important research direction in computer vision, and traditional IAC methods typically require numerous annotated data as training samples [38]. Given this, this study proposed a novel optimized ODA-IAC model by introducing an ODA model and combining CAM, DWA, and SAS. In the dynamic threshold adjustment test, a significant change relationship was observed between KICA and UICA by adjusting the threshold. Specifically, when the threshold was set to 0.6, KICA reached 67%, while UICA increased to 89%. This result indicated that by adjusting the threshold appropriately, the performance of the model on known and unknown image classification tasks could be effectively balanced, thereby improving the overall classification accuracy. This discovery aligned with the findings of Y Wei et al., which demonstrated the efficacy of threshold dynamic adjustment strategies in enhancing the efficacy of ODA models [39]. In addition, on the ImageNet dataset, as the iterations increased, the model’s classification accuracy could reach a peak of 88%. This test result highlighted the important role of DWA mechanism and SAS in optimizing the model training process, especially in terms of efficiency and accuracy when dealing with large-scale datasets. By comparison, the performance of the research model on ImageNet and COCO datasets was superior to other popular domain adaptive models, such as DANN, CADA, and SEDA. Especially when dealing with unknown TD data, research models exhibited higher accuracy and efficiency. This not only confirmed the effectiveness of the research model structure and strategy, but also formed consistency with the research results of advanced technology status, further proving the potential application of open-set domain adaptive models in IAC tasks. This result was consistent with W Jiao et al.’s research [40].
Although the research model has made significant progress in the accuracy and generalization ability of IAC, its performance in handling unknown TD data in extreme situations still needs to be optimized. For example, in cases of extremely imbalanced samples, the model performance may be affected. In addition, this study mainly focuses on static images, and its adaptability to IAC tasks in video or dynamic scenes has not been thoroughly explored. Future research directions can further explore and optimize the strategies of models in handling imbalanced samples and extreme unknown data to improve the model robustness. It is recommended that the model be applied to dynamic scenes or video data to ascertain its performance in such complex scenarios.
3.2 Conclusion
Aiming at the performance problems of traditional IAC methods, the study tried to improve the accuracy and generalization ability of IAC by developing a new ODA model. This study, based on ODA, introduced CAM, DWA, and SAS to construct a comprehensive framework for processing known and unknown image data, namely the optimized ODA-IAC model. The experimental data showed that when = 0.6 and
= 0.7, the model had the best ACC-K and ACC-UNK values, which were 67% and 89%. When the dynamic weight
= 0.6, the model had the best classification accuracy in the ImageNet and COCO datasets, at 83% and 88%. Compared to the same type of model, it had a maximum P-value of 89.6%, a maximum R-value of 90.7%, a maximum F1-value of 90.2%, and a minimum time of 1.2s for labeling the classified target image. In addition, during simulation testing, the visual classification results of each image in the research model were the best, and the image classification was more compact. The highest ACC-K value for transferring SDI to TDI was 89.6%, and the highest ACC-UNK value for transferring TDI to SDI was 93.5%. The annotation classification results of this model for 7 types of images showed that there were 7 images with scores above 80 and 6 images with scores above 90. In summary, the proposed model significantly improves the classification performance on multiple open-set datasets. This indicates that the new model has significant advantages in terms of accuracy and generalization ability for the classification task of unknown TD data. However, this study also has certain limitations, such as in extremely complex domain conditions, the adaptability and accuracy of the model may still need further improvement. Future research will explore more strategies to optimize model structure and algorithms, especially performance optimization when dealing with extreme domain differences and large-scale datasets, to achieve wider applications and higher accuracy.
References
- 1. Choi C, Kampffmeyer M, Handegard NO, Salberg A-B, Brautaset O, Eikvil L, et al. Semi-supervised target classification in multi-frequency echosounder data. ICES J Mar Sci. 2021;78(7):2615–27.
- 2. Dowden B, De Silva O, Huang W, Oldford D. Sea ice classification via deep neural network semantic segmentation. IEEE Sensors J. 2021;21(10):11879–88.
- 3. Srinivasu PN, SivaSai JG, Ijaz MF, Bhoi AK, Kim W, Kang JJ. Classification of skin disease using deep learning neural networks with MobileNet V2 and LSTM. Sensors (Basel). 2021;21(8):2852. pmid:33919583
- 4. Lin C-C, Kuo C-H, Chiang H-T. CNN-based classification for point cloud object with bearing angle image. IEEE Sensors J. 2022;22(1):1003–11.
- 5. Lu S, Li Y, Wang M, Gao F. Mirror invariant convolutional neural networks for image classification. IET Image Process. 2022;16(6):1626–35.
- 6. Li J, Zhang C, Zhou JT, Fu H, Xia S, Hu Q. Deep-LIFT: deep label-specific feature learning for image annotation. IEEE Trans Cybern. 2022;52(8):7732–41. pmid:33566780
- 7. Koohbanani NA, Unnikrishnan B, Khurram SA, Krishnaswamy P, Rajpoot N. Self-Path: self-supervision for classification of pathology images with limited annotations. IEEE Trans Med Imaging. 2021;40(10):2845–56. pmid:33523807
- 8. Zhang S, Li Z, Zhou H-Y, Ma J, Yu Y. Advancing 3D medical image analysis with variable dimension transform based supervised 3D pre-training. Neurocomputing. 2023;529:11–22.
- 9. Xu N, Mao W, Wei P, Zeng D. MDA: Multimodal data augmentation framework for boosting performance on sentiment/emotion classification tasks. IEEE Intell Syst. 2021;36(6):3–12.
- 10. Chen G, Peng P, Wang X, Tian Y. Adversarial reciprocal points learning for open set recognition. IEEE Trans Pattern Anal Mach Intell. 2022;44(11):8065–81. pmid:34428133
- 11. Jiang Z-P, Liu Y-Y, Shao Z-E, Huang K-W. An Improved VGG16 model for pneumonia image classification. Appl Sci. 2021;11(23):11185.
- 12. Maracani A, Pastore VP, Natale L, Rosasco L, Odone F. In-domain versus out-of-domain transfer learning in plankton image classification. Sci Rep. 2023;13(1):10443. pmid:37369770
- 13. Ai X, Zhuang J, Wang Y, Wan P, Fu Y. ResCaps: an improved capsule network and its application in ultrasonic image classification of thyroid papillary carcinoma. Complex Intell Syst. 2021;8(3):1865–73.
- 14. Parmar J, Chouhan S, Raychoudhury V, Rathore S. Open-world machine learning: applications, challenges, and opportunities. ACM Comput Surv. 2023;55(10):1–37.
- 15. Chen H, Wang T, Chen T, Deng W. Hyperspectral image classification based on fusing S3-PCA, 2D-SSA and random patch network. Remote Sens. 2023;15(13):3402.
- 16. Lu Z, Whalen I, Dhebar Y, Deb K, Goodman ED, Banzhaf W, et al. Multiobjective evolutionary design of deep convolutional neural networks for image classification. IEEE Trans Evol Computat. 2021;25(2):277–91.
- 17. Mamat N, Othman MF, Abdulghafor R, Alwan AA, Gulzar Y. Enhancing image annotation technique of fruit classification using a deep learning approach. Sustainability. 2023;15(2):901.
- 18. Abdou MA. Literature review: efficient deep neural networks techniques for medical image analysis. Neural Comput Applic. 2022;34(8):5791–812.
- 19. Wang S, Li C, Wang R, Liu Z, Wang M, Tan H, et al. Annotation-efficient deep learning for automatic medical image segmentation. Nat Commun. 2021;12(1):5915. pmid:34625565
- 20. Liu B-Y, Fan K-J, Su W-H, Peng Y. Two-stage convolutional neural networks for diagnosing the severity of alternaria leaf blotch disease of the apple tree. Remote Sens. 2022;14(11):2519.
- 21. Dhaka VS, Meena SV, Rani G, Sinwar D, , Ijaz MF, et al. A survey of deep convolutional neural networks applied for prediction of plant leaf diseases. Sensors (Basel). 2021;21(14):4749. pmid:34300489
- 22. Huang S-C, Chen C-C, Lan J, Hsieh T-Y, Chuang H-C, Chien M-Y, et al. Deep neural network trained on gigapixel images improves lymph node metastasis detection in clinical settings. Nat Commun. 2022;13(1):3347. pmid:35688834
- 23. Liang W, Liang Y, Jia J. MiAMix: enhancing image classification through a multi-stage augmented mixed sample data augmentation method. Processes. 2023;11(12):3284.
- 24. Korot E, Guan Z, Ferraz D, Wagner SK, Zhang G, Liu X, et al. Code-free deep learning for multi-modality medical image classification. Nat Mach Intell. 2021;3(4):288–98.
- 25. Roschewitz M, Khara G, Yearsley J, Sharma N, James JJ, Ambrózay É, et al. Automatic correction of performance drift under acquisition shift in medical image classification. Nat Commun. 2023;14(1):6608. pmid:37857643
- 26. Bazi Y, Bashmal L, Rahhal MMA, Dayil RA, Ajlan NA. Vision transformers for remote sensing image classification. Remote Sens. 2021;13(3):516.
- 27. Ji Z, Yu X, Yu Y, Pang Y, Zhang Z. Semantic-guided class-imbalance learning model for zero-shot image classification. IEEE Trans Cybern. 2022;52(7):6543–54. pmid:34043516
- 28. Hasanvand M, Nooshyar M, Moharamkhani E, Selyari A. Machine learning methodology for identifying vehicles using image processing. AIA. 2023;1(3):154–62.
- 29. Kwak G-H, Park N-W. Unsupervised domain adaptation with adversarial self-training for crop classification using remote sensing images. Remote Sensing. 2022;14(18):4639.
- 30. Liu X-Q, Ding X-Y, Luo X, Xu X-S. Unsupervised domain adaptation via class aggregation for text recognition. IEEE Trans Circuits Syst Video Technol. 2023;33(10):5617–30.
- 31. Camalan S, Cui K, Pauca VP, Alqahtani S, Silman M, Chan R, et al. Change detection of amazonian alluvial gold mining using deep learning and sentinel-2 imagery. Remote Sens. 2022;14(7):1746.
- 32. Zhao Z, Chen Y, He X. Adaptively heterogeneous transfer learning for hyperspectral image classification. Remote Sens Lett. 2022;13(12):1182–93.
- 33. Chen G, Chen Q, Long S, Zhu W, Yuan Z, Wu Y. Quantum convolutional neural network for image classification. Pattern Anal Applic. 2022;26(2):655–67.
- 34. Li W, Liu X. ADDA: An Adversarial direction-guided decision-based attack via multiple surrogate models. Mathematics. 2023;11(16):3613.
- 35. Li X, Wang X, Chen X, Lu Y, Fu H, Wu YC. Unlabeled data selection for active learning in image classification. Sci Rep. 2024;14(1):424. pmid:38172266
- 36. Wang C, Huang J, Lv M, Wu Y, Qin R. Dual-branch adaptive convolutional transformer for hyperspectral image classification. Remote Sens. 2024;16(9):1615.
- 37. Gera D, Raj Kumar BV, Badveeti NSK, Balasubramanian S. Dynamic adaptive threshold based learning for noisy annotations robust facial expression recognition. Multimed Tools Appl. 2023;83(16):49537–66.
- 38. Salar A, Ahmadi A. Improving loss function for deep convolutional neural network applied in automatic image annotation. Vis Comput. 2024;40(3):1617–29.
- 39. Wei Y, Zhou Y. Spatial-aware network for hyperspectral image classification. Remote Sensing. 2021;13(16):3232.
- 40. Jiao W, Hao X, Qin C. The image classification method with CNN-XGBoost model based on adaptive particle swarm optimization. Information. 2021;12(4):156.