Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

An improved sample selection framework for learning with noisy labels

  • Qian Zhang ,

    Roles Conceptualization, Funding acquisition, Methodology, Writing – original draft

    usstzhangqian@163.com

    Affiliation School of Information Technology, Jiangsu Open University, Nanjing, Jiangsu, China

  • Yi Zhu,

    Roles Supervision

    Affiliation School of Information Technology, Jiangsu Open University, Nanjing, Jiangsu, China

  • Ming Yang,

    Roles Supervision

    Affiliation School of Computer and Electronic Information, Nanjing Normal University, Nanjing, Jiangsu, China

  • Ge Jin,

    Roles Validation

    Affiliation School of Information Technology, Jiangsu Open University, Nanjing, Jiangsu, China

  • Yingwen Zhu,

    Roles Validation

    Affiliation School of Information Technology, Jiangsu Open University, Nanjing, Jiangsu, China

  • Yanjun Lu,

    Roles Visualization

    Affiliation School of Information Technology, Jiangsu Open University, Nanjing, Jiangsu, China

  • Yu Zou,

    Roles Visualization

    Affiliations School of Information Technology, Jiangsu Open University, Nanjing, Jiangsu, China, School of Artificial Intelligence (School of Future Technology), Nanjing University of Information Science & Technology, Nanjing, Jiangsu, China

  • Qiu Chen

    Roles Project administration, Writing – review & editing

    Affiliation Department of Electrical Engineering and Electronics, Graduate School of Engineering, Kogakuin University, Tokyo, Japan

Abstract

Deep neural networks have powerful memory capabilities, yet they frequently suffer from overfitting to noisy labels, leading to a decline in classification and generalization performance. To address this issue, sample selection methods that filter out potentially clean labels have been proposed. However, there is a significant gap in size between the filtered, possibly clean subset and the unlabeled subset, which becomes particularly pronounced at high-noise rates. Consequently, this results in underutilizing label-free samples in sample selection methods, leaving room for performance improvement. This study introduces an enhanced sample selection framework with an oversampling strategy (SOS) to overcome this limitation. This framework leverages the valuable information contained in label-free instances to enhance model performance by combining an SOS with state-of-the-art sample selection methods. We validate the effectiveness of SOS through extensive experiments conducted on both synthetic noisy datasets and real-world datasets such as CIFAR, WebVision, and Clothing1M. The source code for SOS will be made available at https://github.com/LanXiaoPang613/SOS.

Introduction

Deep neural networks (DNNs) are widely adopted for various vision tasks such as object detection [1], course recommendation [2], segmentation [3], image captioning [4], image denoising [5], corporate relative valuation [6], because of their excellent learning capabilities. However, DNNs require high-quality annotated training data, which can be costly to collect and contain data redundancy [7]. Image data sourced from the internet often requires manual annotation through crowdsourcing platforms, a process that consumes a significant amount of time and resources, making it less efficient. Furthermore, the crowdsourcing mechanism, which involves cross-annotating and voting, introduces instances with inaccurate annotations, known as noisy labels. Unfortunately, the robust memory capabilities of DNNs can lead to model overfitting noisy labels, resulting in reduced accuracy and discrimination. To address this challenge, one approach is to manually label a portion of the samples and treat the remaining samples as unlabeled data, processing them using semi-supervised techniques [8] and data quality assessment methods [9, 10] to improve performance. Another approach is to enhance model robustness through learning with noisy labels (LNL) methods. Early research in LNL, including techniques such as loss adjustment [11, 12], noise transition matrix estimation using robust architecture [1317], label correction based on DNN predictions [1824], robust loss functions [2528], and sample selection [2941], has shown promising results.

Loss adjustment involves modifying the loss of all training samples before backward propagation in DNNs to mitigate the influence of samples with noisy labels. Some approaches focus on designing robust architectures to model the noise transition matrix T ∈ [0,1]c×c, which reflects the probability of one category being mislabeled to another category, i.e., Tij: = p (yη = j|y = i), where the c is the category number, yη = j is the noisy label of a sample belonging to the j-th class and y = i is corresponding ground-truth label belonging to i-th class. Researchers have also explored using DNNs to correct noisy labels, leveraging the predictive capabilities of these models. This technique is called label correction. 42 has demonstrated that using a suitably modified loss function enables models trained on noisy datasets to achieve optimal Bayes risk, similar to their performance on clean datasets. Consequently, research on robust loss functions focuses on designing functions that enable models to effectively learn from clean labels while avoiding overfitting to noisy samples. Among numerous research avenues, sample selection methods have garnered widespread attention due to their state-of-the-art (SOTA) performance and ability to purify noisy datasets. As such, they have become mainstream in current LNL research. As the term suggests, sample selection aims to detect noisy labels and then filter out a subset of potentially clean labels for training. Early sample selection methods used dedicated architectures or training strategies to identify and remove potentially noisy samples. Although these methods achieved advanced performance, they left the information within noisy samples unutilized. Consequently, current SOTA sample selection methods incorporate semi-supervised learning (SSL) techniques for robust training, treating noisy samples as unlabeled data. However, these methods exhibit a significant gap between the size of the filtered, potentially clean subset, and the remaining unlabeled subset, particularly at high-noise levels. This gap, as shown in the experimental results in the Experimental results on synthetic datasets section, means that the label-free samples are not fully exploited, indicating potential for performance enhancement. To address this, we propose an improved sample selection framework with an oversampling strategy (SOS). Inspired by UNCION [35], LongReMix [41], and SFA [42], SOS is a simple yet efficient method that mines useful information in label-free instances by combining an oversampling strategy with robust SSL techniques. This enhancement further boosts model performance. As evidenced by the experimental results in the Experimental results on synthetic datasets section, SOS maintains stable performance under high-noise conditions, demonstrating exceptional robustness against noisy labels. We applied SOS to several benchmark datasets, as in previous sample selection methods, and extensive experimental comparisons validate the effectiveness of our approach. Our main contributions are as follows:

  1. We introduce a straightforward but effective sample selection framework with an oversampling strategy, which further utilizes information in label-free samples to achieve SOTA performance.
  2. We propose a uniform sample selection approach, diverging from methods that predominantly rely on estimated noisy posterior probability, to enhance the robustness of DNNs and improve performance.
  3. We introduce an oversampling strategy that complements SOTA one-stage sample selection methods for dataset division and robust training, setting our approach apart from two-stage methods and long-tailed learning research.
  4. SOS exhibits more stable performance than current methods under high-noise levels, with a faster convergence rate.
  5. Through extensive experiments across various noise types and rates, we demonstrate the superiority of SOS over existing SOTA methods, particularly under high-noise conditions.

The rest of this study is organized as follows: The Related works section reviews relevant studies on LNL. The Methodology sectiondetails the proposed SOS framework. The Experiments section discusses experimental results, and the conclusion section presents the conclusions.

Related works

LNL research has emerged as a prominent area of study. The main research directions in this field can be categorized into five groups, as summarized in [43], including loss adjustment, noise transition matrix estimation using robust architecture, label correction based on DNN predictions, robust loss functions, and sample selection. Our work intersects with research on sample selection and long-tailed distribution.

Research on sample selection for LNL

Sample selection methods have garnered significant attention for their SOTA performance. In general, these methods fall into two categories: those utilizing supervised techniques and those employing semi-supervised or unsupervised learning techniques. Research [12, 44] has empirically and theoretically shown that DNNs tend to fit clean samples initially but subsequently overfit noisy samples, resulting in lower losses for clean samples in early epochs. Therefore, early sample selection methods developed dedicated architectures or training strategies to identify and eliminate noisy labels based on this phenomenon. This entire process is supervised, involving only samples with potentially clean labels. For instance, Han et al. [29] introduced the co-teaching strategy using two duplicate networks to select clean samples based on the small loss criterion. Wei et al. [33] proposed the JoCoR framework, an evolution of the co-teaching strategy, where a sample is selected only if both networks predict the same category. Xia et al. [32] combined two multilayer perceptrons (MLPs) with co-teaching to identify clean samples for cross-training the networks.

More recent SOTA methods still filter samples with noisy labels based on loss, but instead of discarding them, they remove the labels and treat these samples as an unlabeled subset. This approach results in a labeled subset and an unlabeled subset, enabling semi-supervised or unsupervised learning techniques for training. Li et al. [34], for example, were the first to incorporate existing SSL techniques into sample selection for LNL, using a GMM to split the data and applying methods such as MixMatch [45] or FixMatch [46] for robust training. Following this, Ortego et al. [31] integrated contrastive learning (CL) to learn robust representations and categorize training samples into label-free and labeled sets. Karim et al. [35] combined Jensen-Shannon divergence (JSD) with unsupervised CL to facilitate robust training under noisy labels. In contrast, Li et al. [37] employed supervised CL to conduct sample selection and robust training. Similarly, Yao et al. [38] utilized JSD to estimate the samples’ likelihood of being clean or noisy, thereby categorizing training data and employing CL to enhance model robustness. Recently, Feng et al. [36] introduced the optimal transport theory to the sample selection process, yielding excellent performance. Diverging from previous methods that solely rely on loss [30, 34] to estimate the posterior probability of a sample containing a noisy label, Huang et al. [39] detected noisy samples by employing two GMMs to establish the relationship between representation and label-noisy annotations. Unlike earlier sample selection methods, which depend on a preset fixed threshold and are ineffective as epochs increase, Li et al. [40] propose a dynamic instance-specific selection method for LNL. Contrasting with previous sample selection methods that incorporate SSL techniques within the same training process for sample selection and robustness training, Cordeiro et al. [41] split these two optimization objectives into two processes.

Research on long-tailed learning

Existing long-tailed learning (LTL) methods primarily address training datasets characterized by well-annotated yet imbalanced class distributions. In such datasets, some classes possess numerous samples, while others have significantly fewer. Most LTL methods, including those referenced in [44, 4749], employ random oversampling and under-oversampling strategies to re-balance class representation during training. In LNL, the clean subset, being smaller than the label-free subset, presents a long-tailed distribution challenge, an area that has received limited research attention. To our knowledge, LongReMix [41] is the pioneer in integrating an SOS into LNL. However, it is a two-stage method that does not fully capitalize on the information available from label-free samples.

Methodology

If all labels are well-annotated for a dataset with c-class, it can be classified as a clean dataset , where xi ∈ ℜρ is the i-th input image, and yi∈{0,1}c denotes its corresponding one-hot target. In this study, we employ two networks for training, each comprising an extractor g(⋅), a classifier f(⋅), and a projection header h(⋅) similar to simCLR [52]. The extracted feature by the extractor for xi ∈ ℜρ is g(xi) and the representation via the projection header is denoted as hi = h(g(xi)). The DNN prediction for xi is denoted as f(g(xi)), simplified as f(xi) for simplicity. Typically, classification training methods predominantly utilize the Cross-Entropy (CE) loss function to train the DNNs. The optimization objective can be expressed as follows: (1)

Considering the gradient of Eq (1) and the powerful fitting capability of DNNs, it is clear that deep neural network models trained with CE loss attempt to fit all labels to the greatest extent possible. However, when the observed target yi in the dataset contains inaccurate annotations (noisy labels ), with the increase of iteration, the DNNs, aided by the CE loss, will also fit these samples with noisy labels, leading to a significant decrease in classification performance and generalization ability [25, 50, 51]. This phenomenon is also known as the overfitting of DNNs to noisy labels in LNL fields [40]. Therefore, LNL aims to learn a global minimizer on noisy datasets which has the same probability of misclassification as that of f* (a global minimizer of Eq (1) in the noise free dataset) [50], i.e., . The comprehensive framework of the SOS is depicted in Fig 1, while Algorithm 2 provides an in-depth illustration of the process.

thumbnail
Fig 1. Overall framework of the SOS.

Each network consists of an extractor g(⋅), a classifier f(⋅), and a projection header h(⋅), similar to simCLR [52]. During training, the dataset is divided into a labeled subset and an unlabeled subset through uniform selection, and then the useful information in the label-free samples is extracted by combining the oversampling strategy with a robust training process, where the robust training is performed using the SSL training technique MixMatch and unsupervised CL. In each network, the two subsets, derived separately from the two networks, are utilized for training. The training process is cyclic, involving repeated iterations of these steps.

https://doi.org/10.1371/journal.pone.0309841.g001

Uniform selection approach

The SOS method combines an additional oversampling mechanism with the uniform sample selection approach and robust training. To elucidate, we first introduce the uniform selection approach. Traditional sample selection methods predominantly depend on the estimated noise posterior probability, derived using GMM to model the loss distribution. However, these methods do not ensure a uniform number of samples for each class in the clean set. Addressing this limitation, Karim et al. [35] propose the common uniform sample selection, which employs JSD instead of GMM for detecting noisy labels. As illustrated in Fig 1, two pre-trained networks are utilized to partition the training set. The predictions of these networks for the same input xi via the softmax layer are denoted as and , respectively, where is the c-th component of the prediction from network 1. JSD is also used to measure the disagreement between the predictions from the two networks and the observed one-hot label : (2) where the JSD function is (3) and KL(⋅‖⋅) is the Kullback-Leibler Divergence. Here, (4) and can be calculated using a similar formula as Eq (4).

Assuming that the disagreements corresponding to j-th class after the sorting is , j∈{1,2, …,c}, the sample selection is based on the calculated disagreement for all training data. A sample is selected if its disagreement value falls within the first R portion (in ascending order) of all the disagreement values for the class indicated by its observed label. This can be mathematically expressed as follows: (5)

R is calculated as follows: (6) where 1(⋅) is an indicator function, and dts is determined by Eq (7): (7) where davg and dmin are the average disagreement and minimum values overall training samples, respectively. τ and dμ are the two hyperparameters used for transferring more samples to the unlabeled set, thus the size of the labeled set Dl (Eq (5)) is generally smaller than the size of the unlabeled set Dul = {(xi)|xiDη/Dl} under high-noise rates when using Eq (5). We illustrate the proportions of labeled and unlabeled samples divided by several SOTA sample selection methods (e.g.,) through a bar chart [54]. As shown in Figs 25, it is evident that in high noise rate scenarios (i.e., from 40%-asym. to 80%-sym. on CIFAR-10, and from 50%-sym. to 80%-sym. in CIFAR-100), the sizes of labeled subsets partitioned by these methods, including the baseline method (UNICON) in this paper, are significantly smaller than those of the unlabeled subsets.

thumbnail
Fig 2. The proportions of labeled and unlabeled samples divided by existing SOTA methods on CIFAR datasets with 40%-asym.

https://doi.org/10.1371/journal.pone.0309841.g002

thumbnail
Fig 3. The proportions of labeled and unlabeled samples divided by existing SOTA methods on CIFAR datasets with 80%-sym.

https://doi.org/10.1371/journal.pone.0309841.g003

thumbnail
Fig 4. The proportions of labeled and unlabeled samples divided by existing SOTA methods on CIFAR datasets with 50%-sym.

https://doi.org/10.1371/journal.pone.0309841.g004

thumbnail
Fig 5. The proportions of labeled and unlabeled samples divided by existing SOTA methods on CIFAR datasets with 80%-sym.

https://doi.org/10.1371/journal.pone.0309841.g005

Robust SSL training with oversampling strategy

Most current SOTA sample selection methods employ SSL techniques to train DNNs simultaneously on labeled and unlabeled sets. They sample an equal number of labeled and unlabeled samples in each epoch, where the number is dependent on the size of the labeled subset. This strategy is effective for low-noise scenarios as the number of labeled samples is greater than that of unlabeled samples (as shown in Figs 2 and 4). However, as shown in Figs 3 and 5, under high-noise conditions, the size of the labeled set is often smaller than that of the unlabeled set. Consequently, this imbalance can interrupt robust SSL training due to the premature depletion of labeled samples in the data-loader. Such interruptions prevent many unlabeled samples from being learned by the DNNs, leaving room for performance enhancement.

To address this issue, we introduce an oversampling strategy. Commonly used in LTL, oversampling involves sampling more frequently from classes with fewer instances (rare classes) to maintain class balance. In this study, since |Dl| ≪ |Dul|, we consider Dl as a rare class and Dul as s a massive class. To prevent training from prematurely terminating due to the exhaustion of labeled samples in the data-loader, it is necessary to oversample more training data from the labeled set. This approach compels the DNNs to assimilate more useful information from the unlabeled samples, which the previous SSL training process might have overlooked. Consequently, the pseudocode for the robust SSL training incorporating the SOS, inspired by [41], is presented below.

  1. Algorithm 1 oversampling during robust SSL training
  2. Input: the labeled set Dl and the unlabeled set Dul, two networks f1 and f2, batch-size b, robust SSL training function Fssl (Blabeled,Bunlabeled,f1,f2);
  3. Calculate the number of iterations in the labeled set based on the batch size b: ;
  4. Initialize the current iteration of the labeled data-loader: itercont = 0;
  5. while itercont < itermax:
  6.  sample a mini-batch from Dl;
  7.  sample a mini-batch Bulabeled = {(xi)∈Dul;i ∈{1, …,b}} from Dul;
  8.  perform robust SSL training based on the current two mini batches: Fssl (Blabeled,Bunlabeled,f1,f2);
  9.  add the iteration of the labeled data-loader: itercont + = 1;
  10. if itercont ≥ itermax:
  11.   break;
  12. end if
  13. end while
  14. Output: two networks f1 and f2.

Since these unlabeled samples have several potential representations useful for unsupervised CL, we adhere to the methodologies in [31, 35, 37] to integrate the CL method into the robust SSL training process post-division to further improve the robustness and performance of the model.

Below, we describe the robust SSL training method Fssl (Blabeled,Bunlabeled,f1,f2) employed in this study. Consistent with the MixMatch for LNL approach used in previous works [3440], for each sample xiBlabeledBunlabeled, we initially perform two weak data augmentations i.e., xi,1,xi,2 = w_da(xi). Subsequently, label co-guessing is performed for both labeled and unlabeled samples, as depicted below: (8) where the refined label is derived by weighting the original observed targets and predictions from the two instances of weak data augmentation xi,1,xi,2 = weak_aug(xi) applied on the current training network fcur when belongs to Blabeled. Otherwise, the guessed label is determined solely by averaging the predictions from two networks f1 and f2, under the two instances of weak data augmentation wherein belongs to Bunlabeled, and sharpen(⋅) is expressed as follows: (9) where S is the temperature parameter, is the average prediction of two networks on two weak data augmentations of input xi, and represents the j-th component of . After the label co-guessing step, we initially replace the original label of sample xi with the refined label . Subsequently, we combine with two instances of strong data augmentation xi,3,xi,4 = strong_aug(xi) to form two new pairs . Finally, two new mini batches are obtained: (10)

Here Blabeled and Bunlabeled are the two mini batches sampled from Dl and Dul respectively, as illustrated in Algorithm 1, and b is the batch size. Through Eq (10), each input xi in Blabeled and Bunlabeled is augmented into two inputs xi,3 and xi,4 with strong data augmentation operation.

The loss function of the SSL training based on the refined labels at t-th epoch is expressed as follows: (11) where N1 is the size of , N2 is the size of . represents a c-dimensional vector where each element is 1/c (i.e., to keep the model’s predictions uniformly distributed). CE is given in Eq (1), λu and λreg are predefined tradeoffs. Furthermore, and are generated from and using the Mixup [53], as employed in previous studies. The process of generation is as follows: (12) where λ = Beta(α,α) and α is a predefined hyperparameter. Using the methodologies outlined in [31, 35, 37], we integrate the CL method into the robust SSL training process when the input is , yielding a total optimization objective expressed as follows: (13) where is obtained from Eq (10), λcl = 0.025 is a coefficient, and . Ultimately, we employ Eq (13) and Algorithm 1 to determine the global minimizer during the robust SSL training.

Proposed SOS framework

We propose the SOS framework, an enhancement of the uniform sample selection method discussed inthe Uniform selection approach section, integrated with the robust SSL training and SOS outlined in the Robust SSL training with oversampling strategy section. This one-stage method efficiently divides the training data and conducts robust training within the same epoch. An overview of our method is depicted in Fig 1, with the detailed workflow presented in Algorithm 2.

  1. Algorithm 2 SOS
  2. Input: the training set Dη, two networks f1, f2, the tradeoff factors λu, λreg, hyperparameter α, filtering factors dμ and τ, Tw is the warm-up epochs, Ttot is the total training epochs, temperature parameter S, batch-size b;
  3. for t = 1 to Ttot do:
  4. if t < Tw:
  5.   pre-train two networks f1 and f2 based on the original training set Dη using CE loss function;
  6. else:
  7.   //the training of network f1;
  8.   fcur = f1;
  9.   //divide the training data via the uniform sample selection approach as illustrated in the Uniform selection approach section;
  10.   construct the labeled and unlabeled sets Dl and Dul for network f1 using Eq (5);
  11.   perform robust SSL training on network f1 employing the oversampling strategy, as outlined in Algorithm 1;
  12.   //the training of network f2;
  13.   fcur = f2;
  14.   //divide the training data via the uniform sample selection approach as illustrated in the Uniform selection approach section;
  15.   construct the labeled and unlabeled sets Dl and Dul for network f2 using Eq (5);
  16.   perform robust SSL training on network f2 employing the oversampling strategy, as outlined in Algorithm 1;
  17. end if
  18. t = t +1  //incremental training epochs.
  19. end for
  20. Output: the labeled set Dl, two robust networks f1 and f2.

Experiments

In this section, we evaluate the performance of the SOS framework on both synthetic and real-world datasets with noise. The characteristics of the datasets used are described below. It is worth noting that the hyperparameters used in the experiments for each dataset in this paper are basically consistent with those used in UNICON and DivideMix. All experiments on the CIFAR datasets in this paper are conducted on a server running Windows 11, equipped with a single 4090 GPU and a 13900K CPU. Experiments on other datasets are conducted on a server running Windows Server 2016, equipped with a single A800 GPU and an Xeon 6248 CPU. The IDE used for all experiments is PyCharm 2023, and the model framework is PyTorch 1.8.0.

The details of datasets

The details of CIFAR-10 and CIFAR-100.

Basic overview. CIFAR-10 and CIFAR-100 [55] are two clean dataset with 10 and 100 categories, respectively. Each dataset contains 50K training data and 10K testing data. Table 1 outlines the basic characteristics of them and Figs 6 and 7 display sample features of some classes in CIFAR-10 and CIFAR-100.

Synthesis of noisy labels. Given the challenge in determining noise characteristics in real-world datasets, prior studies often utilize CIFAR-10/100 [55] to create controlled synthetic label noise at various rates, including both symmetric and asymmetric types, to test the efficacy of proposed methods. Table 1 summarizes these two datasets, noting that only the labels of the training data are altered with generated synthetic noisy labels. Symmetric noise implies that each sample’s target has a probability η/(c-1) of being randomly assigned into any category, and a 1-η chance of remaining unchanged. Asymmetric noise involves a fixed probability η of each target being mapped to a predetermined class. Notably, the asymmetric noise in CIFAR-10 mimics real-world label noise structure, exemplified by mappings, i.e., truckautomabile, birdairplane, deerhouse, catdog. In CIFAR-10, the asymmetric noise transition matrix is relatively sparse, while in CIFAR-100, asymmetric noisy labels are generated by shifting each target to the subsequent category of its superclass, resulting in a denser transition matrix.

Experimental setup. To evaluate the robustness of SOS on these datasets, we use PreAct ResNet-18, aligning with previous studies. The hyperparameters for CIFAR-10/100 are detailed in Table 2. We set λu = 30 for all CIFAR-10 experiments, except for the 10%-asymmetric and 20% symmetric label noise (10%-asym and 20%-sym) scenario, where λu = 0. Although using customized hyperparameter settings for certain noise scenarios could achieve better performance, to demonstrate the robustness of the hyperparameters of our method and to avoid additional ablation experiments, as well as to ensure fair comparison with previous methods like UNICON, DivideMix, we employ almost identical settings for all noise scenarios on this dataset. The hyperparameter settings are identical to UNICON. The learning rate undergoes linear reduction post-warm-up. The weak data augmentation (w_da) follows previous works’ protocols, such as mean subtraction and random flip, while the strong data augmentation (s_da) adopts the CIFAR10-Policy [59].

thumbnail
Table 2. Settings of the hyperparameters used in this study.

https://doi.org/10.1371/journal.pone.0309841.t002

The details of CIFAR-N.

Basic overview. Although synthetic label noise can be generated as described above, modeling real-world noise patterns accurately remains challenging [56]. In addition, many existing real-world noisy datasets complicate the analysis of proposed LNL methods due to the absence of true labels and their large sizes. To address this, Wei et al. [56] developed CIFAR-N, a controllable, moderately sized real-world noisy dataset based on the training images from CIFAR-10 and CIFAR-100. CIFAR-N comprises five types of noise labels for CIFAR-10: aggregate (9.03%), random1-3 (17.23%/18.23%/17.64%), and worst (40.21%); one type for CIFAR-100, namely noisy (40.2%). Table 1 outlines the basic characteristics of CIFAR-N and the features of this dataset are shown in Figs 6 and 7.

Experimental setup. The details of the hyperparameters used are outlined in Table 2. The learning rate undergoes linear reduction after warm-up, and we set λu = 0 for all noise types except for the worst and noisy types, where λu = 30. The data augmentation strategy follows that of CIFAR-10/100.

The details of WebVision.

Basic overview. WebVision [57] is a real-world noisy dataset comprising 2.4 million training images from the internet. Our evaluation of SOS utilizes the first 50 categories from the Google image subset, as done in previous studies. The noise rate is reported about 20% in previous works. Dataset specifics are documented in Table 1 while the visualization of some samples from this set are shown in Fig 8.

Experimental setup. The hyperparameters of this set are detailed in Table 2. The learning rate is linearly reduced after warm-up, and λu = 0. The “w_da” mirrors that of CIFAR-10/100, while the “s_da” adopts the ImageNet-Policy.

The details of Clothing1M.

Basic overview. Clothing1M [58] is a real-world noisy dataset with 14 classes of training data. This dataset is crawled from several shopping sites and contains 38.5% noisy labels in the training set. Its specifics are outlined in Table 1 and visualization of samples are shown in Fig 9.

thumbnail
Fig 9. Visualizing samples from Clothing1M.

We randomly select 10 images from each of the first 10 categories for display from CIFAR datasets and 14 images from each of the first 14 categories from WebVision and Clothing1M.

https://doi.org/10.1371/journal.pone.0309841.g009

Experimental setup. The hyperparameters used in this study are detailed in Table 2. Notably, we exclusively utilize the 1M training images for training, without employing an extra clean validation set. We randomly sample 32K instances from the entire dataset to implement SOS at each epoch, following [34, 35, 37, 41]. The learning rate is linearly reduced after warm-up, and λu = 0. The data augmentation strategy employed is identical to that used in WebVision.

Experimental results on synthetic datasets

We implement the SOS framework on two synthetic noisy datasets, examining a range of noise rates and types. For the synthetic label noise, we present the results for symmetric noise at {20%, 50%, 80%, 90%} and asymmetric noise at {10%, 30%, 40%}, in line with the methodologies of previous studies.

Results on CIFAR-10.

Table 3 compares the performance of SOS with other SOTA benchmarks on the CIFAR-10 dataset across two types of noise. Although most methods report only the best test accuracies for this dataset, we present both the highest accuracies achieved during the entire training process and the average test accuracies over the last 10 epochs. These results are provided for various noise rates and types. Notably, SOS surpasses nearly all other SOTA baselines. The only exceptions are under the 40%-asym noise condition, where SOS is marginally outperformed by DISC and LongReMix by 0.4% and 0.5%, respectively. This slight underperformance is attributed to these two methods adapting their hyperparameters for different noise rates, whereas SOS employs consistent hyperparameters across conditions. However, in experiments with 50%/80%-sym noise, SOS significantly leads by 0.8%/7% and 0.8%/1.4%, respectively, over these methods. This demonstrates SOS’s ability to maintain robustness and effectiveness across varying noise types and ratios. Furthermore, as detailed in the Proposed SOS framework section, when the noise ratio is excessively high (e.g., 50%-sym), the size of the dataset obtained by existing sample selection methods is considerably smaller than the unlabeled set. Consequently, this discrepancy hinders DNNs from fully learning the representations in the unlabeled samples when employing robust training with CL and SSL techniques, thus limiting performance enhancement. Addressing this issue, our study introduces an SOS that encourages models to extract more representation from the unlabeled dataset, leading to substantial improvements even in contexts of high label noise. For instance, under the 90%-sym noise condition, SOS outperforms co-teaching+, DivideMix, MOIT+, Sel-CL+, UNICON, LongReMix, and TCL by 45%, 16%, 17.3%, 10.1%, 12%, and 1.2%, respectively. Even in asymmetric noise scenarios, SOS maintains superiority over nearly all methods except in the 40%-asym case. Fig 10 illustrates the test accuracy of SOS on CIFAR-10. In addition, Fig 11 compares the classification performance of SOS and UNICON under 50%-sym and 40%-asym conditions, showing that SOS not only converges faster than UNICON but also achieves superior performance. In addition, we provide other common metrics for our method on CIFAR-10, such as precision, recall, F1-score, and confusion matrix. The first three metrics are shown in Table 3, while the confusion matrix obtained on the test set at the final epoch is illustrated in Figs 1218. By observing the results of accuracy, precision, recall, and F1-score in Table 3, we can see that the four metrics are very close to each other. Since most existing LNL methods only report accuracy, we follow this practice for fair comparison in the subsequent experiments.

thumbnail
Fig 10. Curve of test accuracy on CIFAR-10.

The test results of SOS on CIFAR-10 with different noise rates.

https://doi.org/10.1371/journal.pone.0309841.g010

thumbnail
Fig 11. The comparison of classification performance between SOS and UNICON.

The steep drop in two figures means the end of the warm-up stage.

https://doi.org/10.1371/journal.pone.0309841.g011

thumbnail
Fig 12. Confusion matrix of SOS on the test set of CIFAR-10 when trained with 20%-sym.

scenario.

https://doi.org/10.1371/journal.pone.0309841.g012

thumbnail
Fig 13. Confusion matrix of SOS on the test set of CIFAR-10 when trained with 50%-sym.

scenario.

https://doi.org/10.1371/journal.pone.0309841.g013

thumbnail
Fig 14. Confusion matrix of SOS on the test set of CIFAR-10 when trained with 80%-sym.

scenario.

https://doi.org/10.1371/journal.pone.0309841.g014

thumbnail
Fig 15. Confusion matrix of SOS on the test set of CIFAR-10 when trained with 90%-sym.

scenario.

https://doi.org/10.1371/journal.pone.0309841.g015

thumbnail
Fig 16. Confusion matrix of SOS on the test set of CIFAR-10 when trained with 10%-asym.

scenario.

https://doi.org/10.1371/journal.pone.0309841.g016

thumbnail
Fig 17. Confusion matrix of SOS on the test set of CIFAR-10 when trained with 30%-asym.

scenario.

https://doi.org/10.1371/journal.pone.0309841.g017

thumbnail
Fig 18. Confusion matrix of SOS on the test set of CIFAR-10 when trained with 40%-asym.

scenario.

https://doi.org/10.1371/journal.pone.0309841.g018

Moreover, Table 4 presents a comparison of the test results between our proposed method and other SOTA methods, such as those mentioned in [35], under severe symmetric label noise conditions {specifically 90%, 92%, and 95%}, posing a massive challenge for previous methodologies. Impressively, our method demonstrates satisfactory performance even in these demanding scenarios. Notably, SOS significantly outperforms UNICON by 1.5%, 2.5%, and 1.5% in experiments conducted with 90%, 92%, and 95% symmetric label noise, respectively. Furthermore, the efficacy of our method is visually represented through the test accuracy curve shown in Fig 19, providing a more intuitive understanding of its performance. In the scenario of 95% symmetric noise, taking the bird category as an example, current research on symmetric noise mostly involves randomly flipping the labels of 95% of bird samples to any other category (including the bird category) during practical operations [34, 35, 36, 51]. This means a certain proportion of samples still retain the bird category label. Therefore, in reality, the number of clean label samples for the bird category should be greater than 250 (i.e., 5%×5000), approaching 725 (i.e., 5%×5000+95%×5000/10). The distribution of samples for other categories is similar. From a statistical perspective, there is still a possibility of correct classification in this scenario. Unfortunately, DivideMix’s performance shows drastic deterioration, while Unicon and our method maintain stable clustering capabilities by introducing balanced selection and contrastive loss strategies for feature extractors. Combining with partitioned clean label samples can ensure the robustness of the classifier. Additionally, since SOS introduces oversampling techniques, it can assist the Contrastive Learning module in more thoroughly mining the information carried by unlabeled samples, thereby achieving better performance in high noise ratio scenarios compared with Unicon.

thumbnail
Fig 19. Test curve of SOS and UNICON on CIFAR-10 with severe label noise.

“SOS (30)” and “SOS (50)” represent the results of our method when λu = 30 and λu = 50, respectively.

https://doi.org/10.1371/journal.pone.0309841.g019

thumbnail
Table 4. Classification performance on CIFAR-10 with heavy symmetric noise.

https://doi.org/10.1371/journal.pone.0309841.t004

Results on CIFAR-100.

Table 5 presents the highest test accuracy achieved on CIFAR-100 throughout the entire training process, as well as the average test accuracies over the last ten epochs. In these comparisons, SOS consistently outperforms nearly all other SOTA methods. Notably, even when the noise rate is relatively low—a scenario where current methods match the accuracy of clean datasets—SOS still maintains a significant lead. More impressively, SOS’s superiority is evident in severe noise environments, such as 80%-sym. For instance, under the 80%-sym condition, SOS surpasses co-teaching, DivideMix, ELR+, UNICON, DISC, LongReMix, and TCL by 65%, 10%, 9%, 6%, 12%, 7%, and 5%, respectively. This indicates that SOS can approximate a global minimizer comparable with clean datasets. However, it is important to acknowledge that when dealing with higher noise rates and larger categories, the SOS method falls slightly behind the current SOTA methods. This limitation stems from the uniform selection approach and the consistent use of the same hyperparameters during training. Fortunately, datasets with such characteristics are rare in practice. In addition, no single method consistently outperforms SOS, highlighting the stability of our approach. For a more intuitive comparison, the test accuracy curve of SOS on CIFAR-100 is provided in Fig 20. Fig 21 compares SOS and UNICON under 50%-sym and 30%-asym conditions, clearly demonstrating SOS’s superiority over current SOTA methods.

Experimental results on real-world datasets

We applied the SOS method to three real-world noisy datasets. Specifically, for the CIFAR-N dataset, which contains real-world noise, we comprehensively compared SOS’s performance against the current SOTA methods across various noise types.

Results on CIFAR-N.

Table 6 presents the results obtained using SOS on the CIFAR-N dataset, with Figs 22 and 23 illustrating a comparative analysis between SOS and UNICON on this dataset. CIFAR-N, a real-world noisy dataset compiled via crowdsourcing platforms, offers a more realistic assessment of the methods’ effectiveness and robustness. Unlike most methods that report only the best test accuracies on this dataset, we provide both the highest accuracies throughout the training process and the average test accuracies over the final 10 epochs, encompassing various noise rates and types. In addition, we have included results for UNICON obtained by running its publicly available code. Consistently, SOS outperforms almost all other SOTA baselines, particularly under severe label noise scenarios, such as “worst” and “noisy.” It is noteworthy that many SOTA methods adjust their hyperparameter settings for the seven noise types, whereas we maintained almost identical settings to those used for CIFAR-10/100. Despite this, our method demonstrates superior performance. For instance, under the “worst” label noise condition (CIFAR-10N), SOS surpasses co-teaching, DivideMix, ELR+, JoCoR+, UNICON, ILL, and PLS by 11%, 2.4%, 3.8%, 0.6%, 1.4%, and 1.2%, respectively. In the more challenging “noisy” scenario on CIFAR-100N, SOS still leads these methods by margins of 13%, 2.1%, 14%, 1.6%, 5.2%, and 0.1%. These results across various experiments on this dataset demonstrate that SOS, with its robust representation learning from unlabeled samples, consistently outperforms other SOTA methods. Furthermore, as shown in Figs 22 and 23, SOS converges faster and achieves higher accuracy than current SOTA methods.

Results on WebVision.

Table 7 details the highest test accuracy achieved on WebVision throughout the training process. In line with prior research, we report the test accuracies for the validation set in WebVision and the ILSVRC2012 validation set. In addition, we include both top-1 and top-5 accuracies for these two validation sets. Although SOS is somewhat less effective than LongReMix on WebVision, it is important to note that LongReMix utilizes a two-stage sample selection method, which requires twice the training time of SOS. This difference in training time becomes more significant in large-scale datasets. Although TCL achieved the highest performance on WebVision, surpassing our method by 1.1%, our method outperformed TCL on ILSVRC12 with a margin of 1.44%. Thus, overall, the method presented in this study demonstrates greater stability and consistently better performance across both validation sets, which is a significant advantage in practical applications.

thumbnail
Table 7. Results on WebVision using pre-trained ResNet-50.

https://doi.org/10.1371/journal.pone.0309841.t007

Results on Clothing1M.

Table 8 displays the highest test accuracy achieved on the Clothing1M dataset over all training epochs. Although TCL and UNICON show a slight performance edge over SOS, it is important to consider the differences in the running environments and the substantial scale of Clothing1M, which comprises one million images. Given these factors, SOS, TCL, and UNICON exhibit comparable performance on this dataset. We replicated UNICON using its publicly available code and maintained the same hyperparameter settings as those used in our method. The results presented in Table 8 are 1.0% lower than UNICON’s originally reported results, suggesting that these three methods are essentially on par in terms of performance. Furthermore, SOS demonstrates competitiveness with, or superiority over, other LNL methods from 2023, such as LongReMix, OT-Filter, DISC, and TCL.

thumbnail
Table 8. Results on Clothing1M using pre-trained ResNet-50.

https://doi.org/10.1371/journal.pone.0309841.t008

Ablation results

As shown in Tables 24, we utilize nearly identical hyperparameter settings for different noise scenarios on the CIFAR datasets. However, existing LNL methods such as DivideMix, LongReMix, and DISC (which employ the same SSL technique as our method) typically require real-time adjustments of multiple hyperparameters, including thresholds and loss weights, based on noise type and noise ratio. In contrast, our method utilizes nearly the same hyperparameter settings across various noise scenarios for this dataset, which sufficiently demonstrates the robustness of our approach. Additionally, our method is inspired by UNICON, which has already shown that DNNs are insensitive to the hyperparameters listed in Table 2 when using uniform selection approach and SSL technique. Therefore, sensitivity analysis of hyperparameters to noise information is not repeated here, please refer to Section 4.3 of [34] and Section 11 of [35] for further analyses. Moreover, SOS consistently applies the same hyperparameter settings across all datasets, except for λu, and the effects of λu have been analyzed in previous works [34, 35, 41], revealing that SOS exhibits relative insensitivity to variations in hyperparameter settings. Consequently, our analysis focuses primarily on the effects of the SOS, comparing scenarios with (w/) and without (w/o) this strategy on test accuracy, specifically using the CIFAR-10 and CIFAR-N datasets. Table 9 presents the results of the ablation study, highlighting only the best outcomes observed throughout all epochs. The results indicate that the SOS enhances model performance by leveraging the feature representations in label-free samples more effectively. This finding supports the utility of noisy samples in improving the classification performance and robustness of DNNs.

Discussion

Efficiency

Table 10 shows a comparison of training times between our method and UNICON, and DivideMix in CIFAR-10 scenarios with 50% symmetric noise and 40% asymmetric noise. All experiments are conducted on a server equipped with a single 4090 GPU. We set the “num_worker” of the dataloader to 0 in all experiments. As clearly indicated in the table, although the introduction of the oversampling technique adds some training time, the overhead is not significant. Since we only need to train once, the testing phase time is identical, and we achieve better performance. Therefore, we consider the additional overhead to be acceptable.

thumbnail
Table 10. The training time cost (hours, i.e., “h”) on CIFAR-10 with 50% symmetric noise.

https://doi.org/10.1371/journal.pone.0309841.t010

Pros

Our extensive experimental comparisons, encompassing both synthetic and real-world noisy datasets, have demonstrated the effectiveness of SOS. Our experiments on CIFAR-10/100 show that SOS offers several advantages over current methods such as DISC, LongReMix, and UNICON, particularly in practical applications. These advantages include robustness to hyperparameter variations, faster convergence, and superior performance under high-noise conditions. The results from three real-world noisy datasets further corroborate these advantages. In addition, we have validated the effectiveness of the SOS through comprehensive ablation studies on both synthetic and real-world datasets. We have established that the label-free samples, identified post-detection, significantly enhance model performance, an aspect previously overlooked in earlier studies.

Limitation

Despite the aforementioned advantages of our method, limitations are inevitable. Following previous research, we discuss the limitations of our method from the perspectives of robustness [41], generalization [69], and trustworthiness [70]. Firstly, from the experiments on synthetic noise datasets, it can be observed that our method exhibits excellent robustness to most noise scenarios. However, in high-noise scenarios with a large number of classes, its robustness still has shortcomings. For example, on the CIFAR-100 with 40% asymmetric noise and 90% symmetric noise, our method significantly lags behind the SOTA methods TCL and DISC, respectively (e.g., 52.0% vs 54.5%, and 74.9% vs 76.5%). This is mainly because, under such extreme noise conditions with many classes, the number of noisy samples is very close to the number of clean samples. For instance, in the 40% asymmetric noise scenario, the apple class has 500 samples, with at least 40% of them flipped to mushrooms. In this case, the number of clean samples in the apple class should be less than 300, and the number of samples labeled as mushrooms should be more than 200. Under these circumstances, our balanced partitioning mechanism faces a significant challenge. However, in CIFAR-10, since the number of samples per class increases to 5000, the difference between clean and noisy samples increases, resulting in better performance. In other words, we can mitigate this issue by increasing the number of samples per class. Additionally, the experimental results on WebVision reveals that, although our method reduces the performance gap on WebVision and ILSVRC12 compared with existing methods such as TCL and LongReMix, a considerable difference remains (i.e., 1.3%), indicating that our method’s generalization is insufficient. Finally, in the 40% asymmetric noise scenario of CIFAR-100, the close proximity of noisy and clean samples leads to suboptimal performance of our sample partitioning strategy, indicating poor trustworthiness in this scenarios. Therefore, improving the model’s confidence in noisy samples is a worthwhile focus for future work.

Adaptability

As illustrated above, one significant advantage of our method is that it achieves better performance while being insensitive to hyperparameters, thus eliminating the need for prior noise information and enhancing its usability. Additionally, as demonstrated by our experiments on several benchmark datasets, our method consistently achieves similar performance across different datasets, indicating its strong generalization capability. Therefore, we believe that our method can be applied to a variety of noise scenarios, e.g., symmetric, asymmetric, and mixed (i.e., real-world [40]) noisy datasets.

Conclusion

In this study, we propose an advanced SOS designed for LNL. Our research has identified a gap in current SOS methods: their inability to fully harness the potential of representations in label-free samples, which is crucial for boosting model robustness and performance. SOS addresses this issue by integrating an oversampling strategy with SOTA SSL methods. Through comprehensive experiments, we have established that SOS exhibits a low sensitivity to hyperparameter variations and consistently delivers optimal or near-optimal outcomes across a diverse range of datasets.

References

  1. 1. Zhang Y, Tang Q. Accelerating autonomy: an integrated perception digital platform for next generation self-driving cars using faster R-CNN and DeepLabV3. Soft Computing. 2024; 28: 1633–1652. https://doi.org/10.1007/s00500-023-09510-0.
  2. 2. Yang Y, Zhang C B, Song X, Dong Z, Zhu H. S, Li W. J. Contextualized knowledge graph embedding for explainable talent training course recommendation. ACM Transaction on Information Systems. 2023; 42(2): 1–27. https://doi.org/10.1145/3597022.
  3. 3. Devi K. J, Sudha S. V. A novel panoptic segmentation model for lung tumor prediction using deep learning approaches. Soft Computing. 2024; 28: 2637–2648. https://doi.org/10.1007/s00500-023-09569-9.
  4. 4. Yang Y, Wei H, Zhu H, Yu D, Xiong H, Yang J. Exploiting Cross-Modal Prediction and Relation Consistency for Semisupervised Image Captioning. IEEE Transaction on Cybernetics. 2024, 54(2): 890–902. pmid:35895659
  5. 5. Kollem S. A fast computational technique based on a novel tangent sigmoid anisotropic diffusion function for image-denoising. Soft Computing. 2024. https://doi.org/10.1007/s00500-024-09628-9.
  6. 6. Yang Y, Yang J. Q, Zhan D. C, Zhu H. S, Gao X. R, et al. Corporate Relative Valuation Using Heterogeneous Multi-Modal Graph Neural Network. IEEE Transaction on Knowledge and Data Engineering. 2023; 35(1): 211–224. https://doi.org/10.1109/TKDE.2021.3080293.
  7. 7. Li Y, Yang J, Wen J. Entropy-based redundancy analysis and information screening. Digital Communications and Networks. 2023; 9(5): 1061–1069. https://doi.org/10.1016/j.dcan.2021.12.001.
  8. 8. Chao X, Li Y. Semisupervised few-shot remote sensing image classification based on knn distance entropy. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 2022; 15: 8798–8805. https://doi.org/10.1109/JSTARS.2022.3213749.
  9. 9. Li Y, Yang J, Zhang Z, Wen J, Kumar P. Healthcare data quality assessment for cybersecurity intelligence. IEEE Transactions on Industrial Informatics. 2023; 19(1): 841–848. https://doi.org/10.1109/TII.2022.3190405.
  10. 10. Li Y.; Ercisli S. Explainable human-in-the-loop healthcare image information quality assessment and selection. CAAI Transaction on Intelligence Technology. 2023.
  11. 11. Patrini G, Rozza A, Menon A, Nock R, Qu L. Making neural networks robust to label noise: A loss correction approach. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2017; 2233–2242. https://doi.org/10.1109/CVPR.2017.240.
  12. 12. Arazo E, Ortego D, Albert P, O’Connor N, Mcguinness K. Unsupervised label noise modeling and loss correction. Proceedings of International Conference on Machine Learning. 2019; 48: 1125–1234.
  13. 13. Xiao T, Xia T, Yang Y, Huang C, Wang X. Learning from massive noisy labeled data for image classification. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015; 2691–2699. https://doi.org/10.1109/CVPR.2015.7298885.
  14. 14. Goldberger J, Reuven E. B. Training deep neural networks using a noise adaptation layer. Proceedings of International Conference Learning Representation. 2017.
  15. 15. Han B, Yao J, Niu G, Zhou M, Tsang I, Zhang Y, et al. Masking: A new perspective of noisy supervision. Advances in Neural Information Processing Systems. 2018; 5836–5846.
  16. 16. Yao J, Wang J, Tsang I, Zhang Y, Sun J, Zhang C, et al. Deep learning from noisy image labels with quality embedding. IEEE Transaction on Image Processing. 2018; 28(4): 1909–1922. pmid:30369444
  17. 17. Cheng L, Zhou X, Zhao L, Li D, Shang H, Zheng Y, et al. Weakly supervised learning with side information for noisy labeled images. Proceedings of European Conference on Computer Vision. 2020; 306–321.
  18. 18. Song H, Kim M, Lee J. G. Selfie: Refurbishing unclean samples for robust deep learning. Proceedings of International Conference on Machine Learning. 2019; 5907–5915.
  19. 19. Tu Y, Zhang B, Li Y, Liu L, Li J, Wang Y, et al. Learning from noisy labels with decoupled meta label purifier. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2023; 19934–19943. https://doi.org/10.1109/CVPR52729.2023.01909.
  20. 20. Wei Q, Feng L, Sun H, Wang R, Guo C, Yin Y. Fine-grained classification with noisy labels. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2023; 11651–11660. https://doi.org/10.1109/CVPR52729.2023.01121.
  21. 21. Zhang Q, Lee F, Wang Y, Ding D, Yao W, Chen L. An joint end-to-end framework for learning with noisy labels. Applied Soft Computing. 2021; 108. https://doi.org/10.1016/j.asoc.2021.107426.
  22. 22. Liu Y. D, He W. B. SELC: Self-ensemble label correction improves learning with noisy labels. Proceedings of the International Joint Conference on Artificial Intelligence. 2022.
  23. 23. Yi K, Wu J. Probabilistic end-to-end noise correction for learning with noisy labels. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2019; 7010–7018. https://doi.org/10.1109/CVPR.2019.00718.
  24. 24. Li J, Li G, Liu F, Yu Y. Neighborhood collective estimation for noisy label identification and correction. Proceedings of European Conference on Computer Vision. 2022; 13684. https://doi.org/10.1007/978-3-031-20053-3_8.
  25. 25. Zhang Z. L, Sabuncu M. Generalized cross entropy loss for training deep neural networks with noisy labels. Advances in Neural Information Processing Systems. 2018; 8778–8788.
  26. 26. Lyu Y, Tsang I. Curriculum loss: Robust learning and generalization against label corruption. Proceedings of International Conference Learning Representation. 2020.
  27. 27. Zhou X, Liu X, Jiang J, Gao X, Ji X. Asymmetric loss functions for learning with noisy labels. Proceedings of International Conference on Machine Learning. 2021.
  28. 28. Wang Y, Ma X, Chen Z, Luo Y, Yi J, Bailey J. Symmetric cross entropy for robust learning with noisy labels. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019; 322–330. https://doi.org/10.1109/ICCV.2019.00041.
  29. 29. Han B, Yao Q. M, Yu X. R, Niu G, Xu M, Tsang T, et al. Co-teaching: Robust training of deep neural networks with extremely noisy labels. Advances in Neural Information Processing Systems. 2018; 31.
  30. 30. Zhao G, Li G, Qin Y, Liu F, Yu Y. Centrality and consistency: Two-stage clean samples identification for learning with instance-dependent noisy labels. Proceedings of European Conference on Computer Vision. 2022; 13685. https://doi.org/10.1007/978-3-031-19806-9_2.
  31. 31. Ortego D, Arazo E, Albert P, O’Connor N. E, McGuinness K. Multi-objective interpolation training for robustness to label noise. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2021; 6602–6611. https://doi.org/10.1109/CVPR46437.2021.00654.
  32. 32. Xia Q. Q, Lee F. F, Chen Q. TCC-net: A two-stage training method with contradictory loss and co-teaching based on meta-learning for learning with noisy labels. Information Sciences. 2023; 639: 119008. https://doi.org/10.1016/j.ins.2023.119008.
  33. 33. Wei H. X, Feng L, Chen X. Y, B An. Combating noisy labels by agreement: A joint training method with co-regularization. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2020. https://doi.org/10.1109/CVPR42600.2020.01374.
  34. 34. Li J, Socher R, Hoi S. DivideMix: Learning with noisy labels as semi-supervised learning. Proceedings of International Conference Learning Representation. 2020.
  35. 35. Karim N, Rizve M N, Rahnavard N, Mian A, Shah M. UNICON: Combating label noise through uniform selection and contrastive learning. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2022; 9666–9676. https://doi.org/10.1109/CVPR52688.2022.00945.
  36. 36. Feng C. W, Ren Y. L, Xi X. K. OT-Filter: An optimal transport filter for learning with noisy labels. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2023; 16164–16174. https://doi.org/10.1109/CVPR52729.2023.01551.
  37. 37. Li S. K, Xia X. B, Ge S. M, Liu T. L. Selective-supervised contrastive learning with noisy labels. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2022; 316–325. https://doi.org/10.1109/CVPR52688.2022.00041.
  38. 38. Yao Y. Z, Sun Z. R, Zhang C. Y, Shen F. M, Wu Q, Zhang J, et al. Jo-SRC: A contrastive approach for combating noisy labels. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2021; 5188–5197. https://doi.org/10.1109/CVPR46437.2021.00515.
  39. 39. Huang Z. Z, Zhang J. P, Shan H. M. Twin contrastive learning with noisy labels. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2023; 11661–11670. https://doi.org/10.1109/CVPR52729.2023.01122.
  40. 40. Li Y, Han H, Shan S, Chen X. DISC: Learning from noisy labels via dynamic instance-specific selection and correction. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2023; 24070–24079 https://doi.org/10.1109/CVPR52729.2023.02305.
  41. 41. Cordeiro F. R, Sachdeva R, Belagiannis V, Reid I, Carneiro G. LongReMix: Robust learning with high confidence samples in a noisy label environment. Pattern Recognition. 2023; 133: 109013. https://doi.org/10.1016/j.patcog.2022.109013.
  42. 42. Li H, Wei T, Yang H, Hu K, Peng C, Sun L, et al. Stochastic feature averaging for learning with long tailed noisy labels. Proceedings of the International Joint Conference on Artificial Intelligence. 2023; 3902–3910. https://doi.org/10.24963/ijcai.2023/434.
  43. 43. Song H, Kim M, Park D, Shin Y, Lee J. G. Learning from noisy labels with deep neural networks: A survey. IEEE Transaction on Neural Networks and Learning Systems. 2022; 34(11): 8135–8153. https://doi.org/10.1109/TNNLS.2022.3152527.
  44. 44. Gui X. J, Wang W, Tian Z. H. Towards understanding deep learning from noisy labels with small-loss criterion. Proceedings of the International Joint Conference on Artificial Intelligence. 2021; 2469–2475. https://doi.org/10.24963/ijcai.2021/340.
  45. 45. Berthelot D, Carlini N, Goodfellow I. J, Papernot N, Oliver A, Raffel C. Mixmatch: A holistic approach to semi-supervised learning. Advances in Neural Information Processing Systems. 2019; 5049–5059.
  46. 46. Sohn K, Berthelot D, Li C, Zhang Z, Carlini N, Cubuk E. D, et al. FixMatch: Simplifying semi-supervised learning with consistency and confidence. Advances in Neural Information Processing Systems. 2020.
  47. 47. Wu T, Liu Z, Huang Q, Wang Y, Lin D. Adversarial robustness under long-tailed distribution Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2021; 8655–8664. https://doi.org/10.1109/CVPR46437.2021.00855.
  48. 48. Pang S, Wang W, Zhang R, Hao W. Hierarchical block aggregation network for long-tailed visual recognition. Neurocomputing. 2024; 549. https://doi.org/10.1016/j.neucom.2023.126463.
  49. 49. Zhang Y, Kang B, Hooi B, Yan S. Deep long-tailed learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2023; 45(9): 10795–10816. pmid:37074896
  50. 50. Ghosh A, Kumar H, Sastry P. Robust loss functions under label noise for deep neural networks. Proceedings of the Association for the Advancement of Artificial Intelligence. 2017; 1919–1925.
  51. 51. Zhang Q, Zhu Y, Yang M, Jin G, Zhu Y W, Chen Q. Cross-to-merge training with class balance strategy for learning with noisy labels. Expert Systems with Applications. 2024; 249: 123846. https://doi.org/10.1016/j.eswa.2024.123846.
  52. 52. Chen T, Kornblith S, Norouzi M, Hinton G. A simple framework for contrastive learning of visual representations. Proceedings of International Conference on Machine Learning. 2020.
  53. 53. Zhang H, Cisse M, Dauphin Y. N, Lopez-Paz D. Mixup: Beyond empirical risk minimization. Proceedings of International Conference Learning Representation. 2018.
  54. 54. Zhang Q, Jin G, Zhu Y, Wei H, Chen Q. BPT-PLR: A Balanced Partitioning and Training Framework with Pseudo-Label Relaxed Contrastive Loss for Noisy Label Learning. Entropy. 2024; 26: 589. pmid:39056952
  55. 55. Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto, 2009. URL: https://www.cs.toronto.edu/∼kriz/learning-features-2009-TR.pdf.
  56. 56. Wei J, Zhu Z, Cheng H, Liu T, Niu G, Liu Y. Learning with noisy labels revisited: A study using real-world human annotations. Proceedings of International Conference Learning Representation. 2022.
  57. 57. Li W, Wang L, Li W, Agustsson E, Gool L. Webvision database: Visual learning and understanding from web data. arXiv 2024; arXiv:1708.02862. https://doi.org/10.48550/arXiv.1708.02862.
  58. 58. Xiao T, Xia T, Yang Y, Huang C, Wang X. Learning from massive noisy labeled data for image classification. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015; 2691–2699. https://doi.org/10.1109/CVPR.2015.7298885.
  59. 59. Cubuk E. D, Zoph B, Mane D, Vasudevan V, Le Q. V. Autoaugment: Learning augmentation policies from data. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2019; 113–123.
  60. 60. Yu X, Han B, Yao J, Niu G, Tsang I, Sugiyama M. How does disagreement help generalization against label corruption. Proceedings of International Conference on Machine Learning. 2019; 7164–7173.
  61. 61. Ghosh A, Lan A. Contrastive learning improves model robustness under label noise. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2021; 2697–2702. https://doi.org/10.1109/CVPRW53098.2021.00304.
  62. 62. Liu S, Niles-Weed J, Razavian N, Fernandez-Granda C. Early-learning regularization prevents memorization of noisy labels. Advances in Neural Information Processing Systems. 2020.
  63. 63. Liu Y, Guo H. Y. Peer loss functions: Learning from noisy labels without knowing noise rates. Proceedings of International Conference Learning Representation. 2020.
  64. 64. Li X. F, Liu T. L, Han B, Niu G, Sugiyama M. Provably end-to-end label-noise learning without anchor points. Proceedings of International Conference on Machine Learning. 2021; 6403–6413.
  65. 65. Chen H, Shah A, Wang J, Tao R, Wang Y, Xie X, et al. Imprecise label learning: A unified framework for learning with various imprecise label configurations. arXiv 2023; arXiv: 2305.12715. https://doi.org/10.48550/arXiv.2305.12715.
  66. 66. Albert P, Arazo E, Krishna T, O’Connor N, McGuinness K. Is your noise correction noisy? PLS: Robustness to label noise with two stage detection. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2023; 118–127. https://doi.org/10.1109/WACV56688.2023.00020.
  67. 67. Liu S, Zhu Z, Qu Q, You C. Robust training under label noise by over-parameterization. Proceedings of International Conference on Machine Learning. 2022; 14153–14172.
  68. 68. Li J, Xiong C, Hoi S. Learning from noisy data with robust representation learning. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021; 9465–9474. https://doi.org/10.1109/ICCV48922.2021.00935.
  69. 69. Yin N, Shen L, Wang M, Luo X, Luo Z, Tao D. OMG: Towards effective graph classification against label noise. IEEE Transactions on Knowledge and Data Engineering. 2023; 35(12): 12873–12886. https://doi.org/10.1109/TKDE.2023.3271677.
  70. 70. Zhang H, Wu B, Yuan X, Pan S, Tong H, Pei J. Trustworthy graph neural networks: aspects, methods, and trends. Proceedings of the IEEE. 2024; 112(2): 97–139. https://doi.org/10.1109/JPROC.2024.3369017.