An improved sample selection framework for learning with noisy labels

Qian Zhang; Yi Zhu; Ming Yang; Ge Jin; Yingwen Zhu; Yanjun Lu; Yu Zou; Qiu Chen

doi:10.1371/journal.pone.0309841

Abstract

Deep neural networks have powerful memory capabilities, yet they frequently suffer from overfitting to noisy labels, leading to a decline in classification and generalization performance. To address this issue, sample selection methods that filter out potentially clean labels have been proposed. However, there is a significant gap in size between the filtered, possibly clean subset and the unlabeled subset, which becomes particularly pronounced at high-noise rates. Consequently, this results in underutilizing label-free samples in sample selection methods, leaving room for performance improvement. This study introduces an enhanced sample selection framework with an oversampling strategy (SOS) to overcome this limitation. This framework leverages the valuable information contained in label-free instances to enhance model performance by combining an SOS with state-of-the-art sample selection methods. We validate the effectiveness of SOS through extensive experiments conducted on both synthetic noisy datasets and real-world datasets such as CIFAR, WebVision, and Clothing1M. The source code for SOS will be made available at https://github.com/LanXiaoPang613/SOS.

Citation: Zhang Q, Zhu Y, Yang M, Jin G, Zhu Y, Lu Y, et al. (2024) An improved sample selection framework for learning with noisy labels. PLoS ONE 19(12): e0309841. https://doi.org/10.1371/journal.pone.0309841

Editor: Israr Ullah, Virtual University of Pakistan, PAKISTAN

Received: May 15, 2024; Accepted: August 20, 2024; Published: December 5, 2024

Copyright: © 2024 Zhang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The datasets such as CIFAR-10/100, Clothing1M, WebVision and CIFAR-N anlysed during current study are available in the http://www.cs.toronto.edu/∼kriz/cifar.html, https://github.com/Cysu/noisy_label, https://data.vision.ee.ethz.ch/cvl/webvision/download.html and http://www.yliuu.com/web-cifarN/index.html, respectively. The source code is available at https://github.com/LanXiaoPang613/SOS. The data underlying the results presented in the study are available from https://github.com/LanXiaoPang613/SOS/blob/main/log.zip.

Funding: Initials of the authors who received each award: QZ. Grant numbers awarded to each author. The full name of each funder: this research was funded by the National Natural Science Foundation of China (No. 62206114) and is partially funded by Natural Science Foundation of the Jiangsu Higher Education Institutions of China (No. 23KJD120003). URL of each funder website: https://www.nsfc.gov.cn/english/site_1/index.html; http://jyt.jiangsu.gov.cn/art/2023/6/2/art_58960_10912793.html. Did the sponsors or funders play any role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript? No, the funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Deep neural networks (DNNs) are widely adopted for various vision tasks such as object detection [1], course recommendation [2], segmentation [3], image captioning [4], image denoising [5], corporate relative valuation [6], because of their excellent learning capabilities. However, DNNs require high-quality annotated training data, which can be costly to collect and contain data redundancy [7]. Image data sourced from the internet often requires manual annotation through crowdsourcing platforms, a process that consumes a significant amount of time and resources, making it less efficient. Furthermore, the crowdsourcing mechanism, which involves cross-annotating and voting, introduces instances with inaccurate annotations, known as noisy labels. Unfortunately, the robust memory capabilities of DNNs can lead to model overfitting noisy labels, resulting in reduced accuracy and discrimination. To address this challenge, one approach is to manually label a portion of the samples and treat the remaining samples as unlabeled data, processing them using semi-supervised techniques [8] and data quality assessment methods [9, 10] to improve performance. Another approach is to enhance model robustness through learning with noisy labels (LNL) methods. Early research in LNL, including techniques such as loss adjustment [11, 12], noise transition matrix estimation using robust architecture [13–17], label correction based on DNN predictions [18–24], robust loss functions [25–28], and sample selection [29–41], has shown promising results.

Loss adjustment involves modifying the loss of all training samples before backward propagation in DNNs to mitigate the influence of samples with noisy labels. Some approaches focus on designing robust architectures to model the noise transition matrix T ∈ [0,1]^c×c, which reflects the probability of one category being mislabeled to another category, i.e., T_ij: = p (y^η = j|y = i), where the c is the category number, y^η = j is the noisy label of a sample belonging to the j-th class and y = i is corresponding ground-truth label belonging to i-th class. Researchers have also explored using DNNs to correct noisy labels, leveraging the predictive capabilities of these models. This technique is called label correction. 42 has demonstrated that using a suitably modified loss function enables models trained on noisy datasets to achieve optimal Bayes risk, similar to their performance on clean datasets. Consequently, research on robust loss functions focuses on designing functions that enable models to effectively learn from clean labels while avoiding overfitting to noisy samples. Among numerous research avenues, sample selection methods have garnered widespread attention due to their state-of-the-art (SOTA) performance and ability to purify noisy datasets. As such, they have become mainstream in current LNL research. As the term suggests, sample selection aims to detect noisy labels and then filter out a subset of potentially clean labels for training. Early sample selection methods used dedicated architectures or training strategies to identify and remove potentially noisy samples. Although these methods achieved advanced performance, they left the information within noisy samples unutilized. Consequently, current SOTA sample selection methods incorporate semi-supervised learning (SSL) techniques for robust training, treating noisy samples as unlabeled data. However, these methods exhibit a significant gap between the size of the filtered, potentially clean subset, and the remaining unlabeled subset, particularly at high-noise levels. This gap, as shown in the experimental results in the Experimental results on synthetic datasets section, means that the label-free samples are not fully exploited, indicating potential for performance enhancement. To address this, we propose an improved sample selection framework with an oversampling strategy (SOS). Inspired by UNCION [35], LongReMix [41], and SFA [42], SOS is a simple yet efficient method that mines useful information in label-free instances by combining an oversampling strategy with robust SSL techniques. This enhancement further boosts model performance. As evidenced by the experimental results in the Experimental results on synthetic datasets section, SOS maintains stable performance under high-noise conditions, demonstrating exceptional robustness against noisy labels. We applied SOS to several benchmark datasets, as in previous sample selection methods, and extensive experimental comparisons validate the effectiveness of our approach. Our main contributions are as follows:

We introduce a straightforward but effective sample selection framework with an oversampling strategy, which further utilizes information in label-free samples to achieve SOTA performance.
We propose a uniform sample selection approach, diverging from methods that predominantly rely on estimated noisy posterior probability, to enhance the robustness of DNNs and improve performance.
We introduce an oversampling strategy that complements SOTA one-stage sample selection methods for dataset division and robust training, setting our approach apart from two-stage methods and long-tailed learning research.
SOS exhibits more stable performance than current methods under high-noise levels, with a faster convergence rate.
Through extensive experiments across various noise types and rates, we demonstrate the superiority of SOS over existing SOTA methods, particularly under high-noise conditions.

The rest of this study is organized as follows: The Related works section reviews relevant studies on LNL. The Methodology sectiondetails the proposed SOS framework. The Experiments section discusses experimental results, and the conclusion section presents the conclusions.

Related works

LNL research has emerged as a prominent area of study. The main research directions in this field can be categorized into five groups, as summarized in [43], including loss adjustment, noise transition matrix estimation using robust architecture, label correction based on DNN predictions, robust loss functions, and sample selection. Our work intersects with research on sample selection and long-tailed distribution.

Research on sample selection for LNL

Sample selection methods have garnered significant attention for their SOTA performance. In general, these methods fall into two categories: those utilizing supervised techniques and those employing semi-supervised or unsupervised learning techniques. Research [12, 44] has empirically and theoretically shown that DNNs tend to fit clean samples initially but subsequently overfit noisy samples, resulting in lower losses for clean samples in early epochs. Therefore, early sample selection methods developed dedicated architectures or training strategies to identify and eliminate noisy labels based on this phenomenon. This entire process is supervised, involving only samples with potentially clean labels. For instance, Han et al. [29] introduced the co-teaching strategy using two duplicate networks to select clean samples based on the small loss criterion. Wei et al. [33] proposed the JoCoR framework, an evolution of the co-teaching strategy, where a sample is selected only if both networks predict the same category. Xia et al. [32] combined two multilayer perceptrons (MLPs) with co-teaching to identify clean samples for cross-training the networks.

More recent SOTA methods still filter samples with noisy labels based on loss, but instead of discarding them, they remove the labels and treat these samples as an unlabeled subset. This approach results in a labeled subset and an unlabeled subset, enabling semi-supervised or unsupervised learning techniques for training. Li et al. [34], for example, were the first to incorporate existing SSL techniques into sample selection for LNL, using a GMM to split the data and applying methods such as MixMatch [45] or FixMatch [46] for robust training. Following this, Ortego et al. [31] integrated contrastive learning (CL) to learn robust representations and categorize training samples into label-free and labeled sets. Karim et al. [35] combined Jensen-Shannon divergence (JSD) with unsupervised CL to facilitate robust training under noisy labels. In contrast, Li et al. [37] employed supervised CL to conduct sample selection and robust training. Similarly, Yao et al. [38] utilized JSD to estimate the samples’ likelihood of being clean or noisy, thereby categorizing training data and employing CL to enhance model robustness. Recently, Feng et al. [36] introduced the optimal transport theory to the sample selection process, yielding excellent performance. Diverging from previous methods that solely rely on loss [30, 34] to estimate the posterior probability of a sample containing a noisy label, Huang et al. [39] detected noisy samples by employing two GMMs to establish the relationship between representation and label-noisy annotations. Unlike earlier sample selection methods, which depend on a preset fixed threshold and are ineffective as epochs increase, Li et al. [40] propose a dynamic instance-specific selection method for LNL. Contrasting with previous sample selection methods that incorporate SSL techniques within the same training process for sample selection and robustness training, Cordeiro et al. [41] split these two optimization objectives into two processes.

Research on long-tailed learning

Existing long-tailed learning (LTL) methods primarily address training datasets characterized by well-annotated yet imbalanced class distributions. In such datasets, some classes possess numerous samples, while others have significantly fewer. Most LTL methods, including those referenced in [44, 47–49], employ random oversampling and under-oversampling strategies to re-balance class representation during training. In LNL, the clean subset, being smaller than the label-free subset, presents a long-tailed distribution challenge, an area that has received limited research attention. To our knowledge, LongReMix [41] is the pioneer in integrating an SOS into LNL. However, it is a two-stage method that does not fully capitalize on the information available from label-free samples.

Methodology

If all labels are well-annotated for a dataset with c-class, it can be classified as a clean dataset , where x_i ∈ ℜ^ρ is the i-th input image, and y_i∈{0,1}^c denotes its corresponding one-hot target. In this study, we employ two networks for training, each comprising an extractor g(⋅), a classifier f(⋅), and a projection header h(⋅) similar to simCLR [52]. The extracted feature by the extractor for x_i ∈ ℜ^ρ is g(x_i) and the representation via the projection header is denoted as h_i = h(g(x_i)). The DNN prediction for x_i is denoted as f(g(x_i)), simplified as f(x_i) for simplicity. Typically, classification training methods predominantly utilize the Cross-Entropy (CE) loss function to train the DNNs. The optimization objective can be expressed as follows: (1)

Considering the gradient of Eq (1) and the powerful fitting capability of DNNs, it is clear that deep neural network models trained with CE loss attempt to fit all labels to the greatest extent possible. However, when the observed target y_i in the dataset contains inaccurate annotations (noisy labels ), with the increase of iteration, the DNNs, aided by the CE loss, will also fit these samples with noisy labels, leading to a significant decrease in classification performance and generalization ability [25, 50, 51]. This phenomenon is also known as the overfitting of DNNs to noisy labels in LNL fields [40]. Therefore, LNL aims to learn a global minimizer on noisy datasets which has the same probability of misclassification as that of f^* (a global minimizer of Eq (1) in the noise free dataset) [50], i.e., . The comprehensive framework of the SOS is depicted in Fig 1, while Algorithm 2 provides an in-depth illustration of the process.

Download:

Fig 1. Overall framework of the SOS.

Each network consists of an extractor g(⋅), a classifier f(⋅), and a projection header h(⋅), similar to simCLR [52]. During training, the dataset is divided into a labeled subset and an unlabeled subset through uniform selection, and then the useful information in the label-free samples is extracted by combining the oversampling strategy with a robust training process, where the robust training is performed using the SSL training technique MixMatch and unsupervised CL. In each network, the two subsets, derived separately from the two networks, are utilized for training. The training process is cyclic, involving repeated iterations of these steps.

https://doi.org/10.1371/journal.pone.0309841.g001

Uniform selection approach

The SOS method combines an additional oversampling mechanism with the uniform sample selection approach and robust training. To elucidate, we first introduce the uniform selection approach. Traditional sample selection methods predominantly depend on the estimated noise posterior probability, derived using GMM to model the loss distribution. However, these methods do not ensure a uniform number of samples for each class in the clean set. Addressing this limitation, Karim et al. [35] propose the common uniform sample selection, which employs JSD instead of GMM for detecting noisy labels. As illustrated in Fig 1, two pre-trained networks are utilized to partition the training set. The predictions of these networks for the same input x_i via the softmax layer are denoted as and , respectively, where is the c-th component of the prediction from network 1. JSD is also used to measure the disagreement between the predictions from the two networks and the observed one-hot label : (2) where the JSD function is (3) and KL(⋅‖⋅) is the Kullback-Leibler Divergence. Here, (4) and can be calculated using a similar formula as Eq (4).

Assuming that the disagreements corresponding to j-th class after the sorting is , j∈{1,2, …,c}, the sample selection is based on the calculated disagreement for all training data. A sample is selected if its disagreement value falls within the first R portion (in ascending order) of all the disagreement values for the class indicated by its observed label. This can be mathematically expressed as follows: (5)

R is calculated as follows: (6) where 1(⋅) is an indicator function, and d_ts is determined by Eq (7): (7) where d_avg and d_min are the average disagreement and minimum values overall training samples, respectively. τ and d_μ are the two hyperparameters used for transferring more samples to the unlabeled set, thus the size of the labeled set D_l (Eq (5)) is generally smaller than the size of the unlabeled set D_ul = {(x_i)|x_i∈D^η/D_l} under high-noise rates when using Eq (5). We illustrate the proportions of labeled and unlabeled samples divided by several SOTA sample selection methods (e.g.,) through a bar chart [54]. As shown in Figs 2–5, it is evident that in high noise rate scenarios (i.e., from 40%-asym. to 80%-sym. on CIFAR-10, and from 50%-sym. to 80%-sym. in CIFAR-100), the sizes of labeled subsets partitioned by these methods, including the baseline method (UNICON) in this paper, are significantly smaller than those of the unlabeled subsets.

Download:

Fig 2. The proportions of labeled and unlabeled samples divided by existing SOTA methods on CIFAR datasets with 40%-asym.

https://doi.org/10.1371/journal.pone.0309841.g002

Download:

Fig 3. The proportions of labeled and unlabeled samples divided by existing SOTA methods on CIFAR datasets with 80%-sym.

https://doi.org/10.1371/journal.pone.0309841.g003

Download:

Fig 4. The proportions of labeled and unlabeled samples divided by existing SOTA methods on CIFAR datasets with 50%-sym.

https://doi.org/10.1371/journal.pone.0309841.g004

Download:

Fig 5. The proportions of labeled and unlabeled samples divided by existing SOTA methods on CIFAR datasets with 80%-sym.

https://doi.org/10.1371/journal.pone.0309841.g005

Robust SSL training with oversampling strategy

Most current SOTA sample selection methods employ SSL techniques to train DNNs simultaneously on labeled and unlabeled sets. They sample an equal number of labeled and unlabeled samples in each epoch, where the number is dependent on the size of the labeled subset. This strategy is effective for low-noise scenarios as the number of labeled samples is greater than that of unlabeled samples (as shown in Figs 2 and 4). However, as shown in Figs 3 and 5, under high-noise conditions, the size of the labeled set is often smaller than that of the unlabeled set. Consequently, this imbalance can interrupt robust SSL training due to the premature depletion of labeled samples in the data-loader. Such interruptions prevent many unlabeled samples from being learned by the DNNs, leaving room for performance enhancement.

To address this issue, we introduce an oversampling strategy. Commonly used in LTL, oversampling involves sampling more frequently from classes with fewer instances (rare classes) to maintain class balance. In this study, since |D_l| ≪ |D_ul|, we consider D_l as a rare class and D_ul as s a massive class. To prevent training from prematurely terminating due to the exhaustion of labeled samples in the data-loader, it is necessary to oversample more training data from the labeled set. This approach compels the DNNs to assimilate more useful information from the unlabeled samples, which the previous SSL training process might have overlooked. Consequently, the pseudocode for the robust SSL training incorporating the SOS, inspired by [41], is presented below.

Algorithm 1 oversampling during robust SSL training
Input: the labeled set D_l and the unlabeled set D_ul, two networks f₁ and f₂, batch-size b, robust SSL training function F_ssl (B_labeled,B_unlabeled,f₁,f₂);
Calculate the number of iterations in the labeled set based on the batch size b: ;
Initialize the current iteration of the labeled data-loader: iter_cont = 0;
while iter_cont < iter_max:
sample a mini-batch from D_l;
sample a mini-batch B_ulabeled = {(x_i)∈D_ul;i ∈{1, …,b}} from D_ul;
perform robust SSL training based on the current two mini batches: F_ssl (B_labeled,B_unlabeled,f₁,f₂);
add the iteration of the labeled data-loader: iter_cont + = 1;
if iter_cont ≥ iter_max:
break;
end if
end while
Output: two networks f₁ and f₂.

Since these unlabeled samples have several potential representations useful for unsupervised CL, we adhere to the methodologies in [31, 35, 37] to integrate the CL method into the robust SSL training process post-division to further improve the robustness and performance of the model.

Below, we describe the robust SSL training method F_ssl (B_labeled,B_unlabeled,f₁,f₂) employed in this study. Consistent with the MixMatch for LNL approach used in previous works [34–40], for each sample x_i ∈ B_labeled ∪B_unlabeled, we initially perform two weak data augmentations i.e., x_i,1,x_i,2 = w_da(x_i). Subsequently, label co-guessing is performed for both labeled and unlabeled samples, as depicted below: (8) where the refined label is derived by weighting the original observed targets and predictions from the two instances of weak data augmentation x_i,1,x_i,2 = weak_aug(x_i) applied on the current training network f_cur when belongs to B_labeled. Otherwise, the guessed label is determined solely by averaging the predictions from two networks f₁ and f₂, under the two instances of weak data augmentation wherein belongs to B_unlabeled, and sharpen(⋅) is expressed as follows: (9) where S is the temperature parameter, is the average prediction of two networks on two weak data augmentations of input x_i, and represents the j-th component of . After the label co-guessing step, we initially replace the original label of sample x_i with the refined label . Subsequently, we combine with two instances of strong data augmentation x_i,3,x_i,4 = strong_aug(x_i) to form two new pairs . Finally, two new mini batches are obtained: (10)

Here B_labeled and B_unlabeled are the two mini batches sampled from D_l and D_ul respectively, as illustrated in Algorithm 1, and b is the batch size. Through Eq (10), each input x_i in B_labeled and B_unlabeled is augmented into two inputs x_i,3 and x_i,4 with strong data augmentation operation.

The loss function of the SSL training based on the refined labels at t-th epoch is expressed as follows: (11) where N₁ is the size of , N₂ is the size of . represents a c-dimensional vector where each element is 1/c (i.e., to keep the model’s predictions uniformly distributed). CE is given in Eq (1), λ_u and λ_reg are predefined tradeoffs. Furthermore, and are generated from and using the Mixup [53], as employed in previous studies. The process of generation is as follows: (12) where λ = Beta(α,α) and α is a predefined hyperparameter. Using the methodologies outlined in [31, 35, 37], we integrate the CL method into the robust SSL training process when the input is , yielding a total optimization objective expressed as follows: (13) where is obtained from Eq (10), λ_cl = 0.025 is a coefficient, and . Ultimately, we employ Eq (13) and Algorithm 1 to determine the global minimizer during the robust SSL training.

Proposed SOS framework

We propose the SOS framework, an enhancement of the uniform sample selection method discussed inthe Uniform selection approach section, integrated with the robust SSL training and SOS outlined in the Robust SSL training with oversampling strategy section. This one-stage method efficiently divides the training data and conducts robust training within the same epoch. An overview of our method is depicted in Fig 1, with the detailed workflow presented in Algorithm 2.

Algorithm 2 SOS
Input: the training set D^η, two networks f₁, f₂, the tradeoff factors λ_u, λ_reg, hyperparameter α, filtering factors d_μ and τ, T_w is the warm-up epochs, T_tot is the total training epochs, temperature parameter S, batch-size b;
for t = 1 to T_tot do:
if t < T_w:
pre-train two networks f₁ and f₂ based on the original training set D^η using CE loss function;
else:
//the training of network f₁;
f_cur = f₁;
//divide the training data via the uniform sample selection approach as illustrated in the Uniform selection approach section;
construct the labeled and unlabeled sets D_l and D_ul for network f₁ using Eq (5);
perform robust SSL training on network f₁ employing the oversampling strategy, as outlined in Algorithm 1;
//the training of network f₂;
f_cur = f₂;
//divide the training data via the uniform sample selection approach as illustrated in the Uniform selection approach section;
construct the labeled and unlabeled sets D_l and D_ul for network f₂ using Eq (5);
perform robust SSL training on network f₂ employing the oversampling strategy, as outlined in Algorithm 1;
end if
t = t +1 //incremental training epochs.
end for
Output: the labeled set D_l, two robust networks f₁ and f₂.

Experiments

In this section, we evaluate the performance of the SOS framework on both synthetic and real-world datasets with noise. The characteristics of the datasets used are described below. It is worth noting that the hyperparameters used in the experiments for each dataset in this paper are basically consistent with those used in UNICON and DivideMix. All experiments on the CIFAR datasets in this paper are conducted on a server running Windows 11, equipped with a single 4090 GPU and a 13900K CPU. Experiments on other datasets are conducted on a server running Windows Server 2016, equipped with a single A800 GPU and an Xeon 6248 CPU. The IDE used for all experiments is PyCharm 2023, and the model framework is PyTorch 1.8.0.

The details of datasets

The details of CIFAR-10 and CIFAR-100.

Basic overview. CIFAR-10 and CIFAR-100 [55] are two clean dataset with 10 and 100 categories, respectively. Each dataset contains 50K training data and 10K testing data. Table 1 outlines the basic characteristics of them and Figs 6 and 7 display sample features of some classes in CIFAR-10 and CIFAR-100.

Download:

Fig 6. Visualizing samples from CIFAR-10.

https://doi.org/10.1371/journal.pone.0309841.g006

Download:

Fig 7. Visualizing samples from CIFAR-100.

https://doi.org/10.1371/journal.pone.0309841.g007

Download:

Table 1. Summary of datasets.

https://doi.org/10.1371/journal.pone.0309841.t001

Synthesis of noisy labels. Given the challenge in determining noise characteristics in real-world datasets, prior studies often utilize CIFAR-10/100 [55] to create controlled synthetic label noise at various rates, including both symmetric and asymmetric types, to test the efficacy of proposed methods. Table 1 summarizes these two datasets, noting that only the labels of the training data are altered with generated synthetic noisy labels. Symmetric noise implies that each sample’s target has a probability η/(c-1) of being randomly assigned into any category, and a 1-η chance of remaining unchanged. Asymmetric noise involves a fixed probability η of each target being mapped to a predetermined class. Notably, the asymmetric noise in CIFAR-10 mimics real-world label noise structure, exemplified by mappings, i.e., truck → automabile, bird → airplane, deer → house, cat ⇄ dog. In CIFAR-10, the asymmetric noise transition matrix is relatively sparse, while in CIFAR-100, asymmetric noisy labels are generated by shifting each target to the subsequent category of its superclass, resulting in a denser transition matrix.

Experimental setup. To evaluate the robustness of SOS on these datasets, we use PreAct ResNet-18, aligning with previous studies. The hyperparameters for CIFAR-10/100 are detailed in Table 2. We set λ_u = 30 for all CIFAR-10 experiments, except for the 10%-asymmetric and 20% symmetric label noise (10%-asym and 20%-sym) scenario, where λ_u = 0. Although using customized hyperparameter settings for certain noise scenarios could achieve better performance, to demonstrate the robustness of the hyperparameters of our method and to avoid additional ablation experiments, as well as to ensure fair comparison with previous methods like UNICON, DivideMix, we employ almost identical settings for all noise scenarios on this dataset. The hyperparameter settings are identical to UNICON. The learning rate undergoes linear reduction post-warm-up. The weak data augmentation (w_da) follows previous works’ protocols, such as mean subtraction and random flip, while the strong data augmentation (s_da) adopts the CIFAR10-Policy [59].

Download:

Table 2. Settings of the hyperparameters used in this study.

https://doi.org/10.1371/journal.pone.0309841.t002

The details of CIFAR-N.

Basic overview. Although synthetic label noise can be generated as described above, modeling real-world noise patterns accurately remains challenging [56]. In addition, many existing real-world noisy datasets complicate the analysis of proposed LNL methods due to the absence of true labels and their large sizes. To address this, Wei et al. [56] developed CIFAR-N, a controllable, moderately sized real-world noisy dataset based on the training images from CIFAR-10 and CIFAR-100. CIFAR-N comprises five types of noise labels for CIFAR-10: aggregate (9.03%), random1-3 (17.23%/18.23%/17.64%), and worst (40.21%); one type for CIFAR-100, namely noisy (40.2%). Table 1 outlines the basic characteristics of CIFAR-N and the features of this dataset are shown in Figs 6 and 7.

Experimental setup. The details of the hyperparameters used are outlined in Table 2. The learning rate undergoes linear reduction after warm-up, and we set λ_u = 0 for all noise types except for the worst and noisy types, where λ_u = 30. The data augmentation strategy follows that of CIFAR-10/100.

The details of WebVision.

Basic overview. WebVision [57] is a real-world noisy dataset comprising 2.4 million training images from the internet. Our evaluation of SOS utilizes the first 50 categories from the Google image subset, as done in previous studies. The noise rate is reported about 20% in previous works. Dataset specifics are documented in Table 1 while the visualization of some samples from this set are shown in Fig 8.

Download:

Fig 8. Visualizing samples from WebVision.

https://doi.org/10.1371/journal.pone.0309841.g008

Experimental setup. The hyperparameters of this set are detailed in Table 2. The learning rate is linearly reduced after warm-up, and λ_u = 0. The “w_da” mirrors that of CIFAR-10/100, while the “s_da” adopts the ImageNet-Policy.

The details of Clothing1M.

Basic overview. Clothing1M [58] is a real-world noisy dataset with 14 classes of training data. This dataset is crawled from several shopping sites and contains 38.5% noisy labels in the training set. Its specifics are outlined in Table 1 and visualization of samples are shown in Fig 9.

Download:

Fig 9. Visualizing samples from Clothing1M.

We randomly select 10 images from each of the first 10 categories for display from CIFAR datasets and 14 images from each of the first 14 categories from WebVision and Clothing1M.

https://doi.org/10.1371/journal.pone.0309841.g009

Experimental setup. The hyperparameters used in this study are detailed in Table 2. Notably, we exclusively utilize the 1M training images for training, without employing an extra clean validation set. We randomly sample 32K instances from the entire dataset to implement SOS at each epoch, following [34, 35, 37, 41]. The learning rate is linearly reduced after warm-up, and λ_u = 0. The data augmentation strategy employed is identical to that used in WebVision.

Experimental results on synthetic datasets

We implement the SOS framework on two synthetic noisy datasets, examining a range of noise rates and types. For the synthetic label noise, we present the results for symmetric noise at {20%, 50%, 80%, 90%} and asymmetric noise at {10%, 30%, 40%}, in line with the methodologies of previous studies.

Results on CIFAR-10.

Table 3 compares the performance of SOS with other SOTA benchmarks on the CIFAR-10 dataset across two types of noise. Although most methods report only the best test accuracies for this dataset, we present both the highest accuracies achieved during the entire training process and the average test accuracies over the last 10 epochs. These results are provided for various noise rates and types. Notably, SOS surpasses nearly all other SOTA baselines. The only exceptions are under the 40%-asym noise condition, where SOS is marginally outperformed by DISC and LongReMix by 0.4% and 0.5%, respectively. This slight underperformance is attributed to these two methods adapting their hyperparameters for different noise rates, whereas SOS employs consistent hyperparameters across conditions. However, in experiments with 50%/80%-sym noise, SOS significantly leads by 0.8%/7% and 0.8%/1.4%, respectively, over these methods. This demonstrates SOS’s ability to maintain robustness and effectiveness across varying noise types and ratios. Furthermore, as detailed in the Proposed SOS framework section, when the noise ratio is excessively high (e.g., 50%-sym), the size of the dataset obtained by existing sample selection methods is considerably smaller than the unlabeled set. Consequently, this discrepancy hinders DNNs from fully learning the representations in the unlabeled samples when employing robust training with CL and SSL techniques, thus limiting performance enhancement. Addressing this issue, our study introduces an SOS that encourages models to extract more representation from the unlabeled dataset, leading to substantial improvements even in contexts of high label noise. For instance, under the 90%-sym noise condition, SOS outperforms co-teaching+, DivideMix, MOIT+, Sel-CL+, UNICON, LongReMix, and TCL by 45%, 16%, 17.3%, 10.1%, 12%, and 1.2%, respectively. Even in asymmetric noise scenarios, SOS maintains superiority over nearly all methods except in the 40%-asym case. Fig 10 illustrates the test accuracy of SOS on CIFAR-10. In addition, Fig 11 compares the classification performance of SOS and UNICON under 50%-sym and 40%-asym conditions, showing that SOS not only converges faster than UNICON but also achieves superior performance. In addition, we provide other common metrics for our method on CIFAR-10, such as precision, recall, F1-score, and confusion matrix. The first three metrics are shown in Table 3, while the confusion matrix obtained on the test set at the final epoch is illustrated in Figs 12–18. By observing the results of accuracy, precision, recall, and F1-score in Table 3, we can see that the four metrics are very close to each other. Since most existing LNL methods only report accuracy, we follow this practice for fair comparison in the subsequent experiments.

Download:

Fig 10. Curve of test accuracy on CIFAR-10.

The test results of SOS on CIFAR-10 with different noise rates.

https://doi.org/10.1371/journal.pone.0309841.g010

Download:

Fig 11. The comparison of classification performance between SOS and UNICON.

The steep drop in two figures means the end of the warm-up stage.

https://doi.org/10.1371/journal.pone.0309841.g011

Download:

Fig 12. Confusion matrix of SOS on the test set of CIFAR-10 when trained with 20%-sym.

scenario.

https://doi.org/10.1371/journal.pone.0309841.g012

Download:

Fig 13. Confusion matrix of SOS on the test set of CIFAR-10 when trained with 50%-sym.

scenario.

https://doi.org/10.1371/journal.pone.0309841.g013

Download:

Fig 14. Confusion matrix of SOS on the test set of CIFAR-10 when trained with 80%-sym.

scenario.

https://doi.org/10.1371/journal.pone.0309841.g014

Download:

Fig 15. Confusion matrix of SOS on the test set of CIFAR-10 when trained with 90%-sym.

scenario.

https://doi.org/10.1371/journal.pone.0309841.g015

Download:

Fig 16. Confusion matrix of SOS on the test set of CIFAR-10 when trained with 10%-asym.

scenario.

https://doi.org/10.1371/journal.pone.0309841.g016

Download:

Fig 17. Confusion matrix of SOS on the test set of CIFAR-10 when trained with 30%-asym.

scenario.

https://doi.org/10.1371/journal.pone.0309841.g017

Download:

Fig 18. Confusion matrix of SOS on the test set of CIFAR-10 when trained with 40%-asym.

scenario.

https://doi.org/10.1371/journal.pone.0309841.g018

Download:

Table 3. Results on CIFAR-10 using PreAct ResNet-18.

https://doi.org/10.1371/journal.pone.0309841.t003

Moreover, Table 4 presents a comparison of the test results between our proposed method and other SOTA methods, such as those mentioned in [35], under severe symmetric label noise conditions {specifically 90%, 92%, and 95%}, posing a massive challenge for previous methodologies. Impressively, our method demonstrates satisfactory performance even in these demanding scenarios. Notably, SOS significantly outperforms UNICON by 1.5%, 2.5%, and 1.5% in experiments conducted with 90%, 92%, and 95% symmetric label noise, respectively. Furthermore, the efficacy of our method is visually represented through the test accuracy curve shown in Fig 19, providing a more intuitive understanding of its performance. In the scenario of 95% symmetric noise, taking the bird category as an example, current research on symmetric noise mostly involves randomly flipping the labels of 95% of bird samples to any other category (including the bird category) during practical operations [34, 35, 36, 51]. This means a certain proportion of samples still retain the bird category label. Therefore, in reality, the number of clean label samples for the bird category should be greater than 250 (i.e., 5%×5000), approaching 725 (i.e., 5%×5000+95%×5000/10). The distribution of samples for other categories is similar. From a statistical perspective, there is still a possibility of correct classification in this scenario. Unfortunately, DivideMix’s performance shows drastic deterioration, while Unicon and our method maintain stable clustering capabilities by introducing balanced selection and contrastive loss strategies for feature extractors. Combining with partitioned clean label samples can ensure the robustness of the classifier. Additionally, since SOS introduces oversampling techniques, it can assist the Contrastive Learning module in more thoroughly mining the information carried by unlabeled samples, thereby achieving better performance in high noise ratio scenarios compared with Unicon.

Download:

Fig 19. Test curve of SOS and UNICON on CIFAR-10 with severe label noise.

“SOS (30)” and “SOS (50)” represent the results of our method when λ_u = 30 and λ_u = 50, respectively.

https://doi.org/10.1371/journal.pone.0309841.g019

Download:

Table 4. Classification performance on CIFAR-10 with heavy symmetric noise.

https://doi.org/10.1371/journal.pone.0309841.t004

Results on CIFAR-100.

Table 5 presents the highest test accuracy achieved on CIFAR-100 throughout the entire training process, as well as the average test accuracies over the last ten epochs. In these comparisons, SOS consistently outperforms nearly all other SOTA methods. Notably, even when the noise rate is relatively low—a scenario where current methods match the accuracy of clean datasets—SOS still maintains a significant lead. More impressively, SOS’s superiority is evident in severe noise environments, such as 80%-sym. For instance, under the 80%-sym condition, SOS surpasses co-teaching, DivideMix, ELR+, UNICON, DISC, LongReMix, and TCL by 65%, 10%, 9%, 6%, 12%, 7%, and 5%, respectively. This indicates that SOS can approximate a global minimizer comparable with clean datasets. However, it is important to acknowledge that when dealing with higher noise rates and larger categories, the SOS method falls slightly behind the current SOTA methods. This limitation stems from the uniform selection approach and the consistent use of the same hyperparameters during training. Fortunately, datasets with such characteristics are rare in practice. In addition, no single method consistently outperforms SOS, highlighting the stability of our approach. For a more intuitive comparison, the test accuracy curve of SOS on CIFAR-100 is provided in Fig 20. Fig 21 compares SOS and UNICON under 50%-sym and 30%-asym conditions, clearly demonstrating SOS’s superiority over current SOTA methods.

Download:

Fig 20. Curve of test accuracy on CIFAR-100.

https://doi.org/10.1371/journal.pone.0309841.g020

Download:

Fig 21. The comparison between SOS and UNICON.

https://doi.org/10.1371/journal.pone.0309841.g021

Download:

Table 5. Results on CIFAR-100.

https://doi.org/10.1371/journal.pone.0309841.t005

Experimental results on real-world datasets

We applied the SOS method to three real-world noisy datasets. Specifically, for the CIFAR-N dataset, which contains real-world noise, we comprehensively compared SOS’s performance against the current SOTA methods across various noise types.

Results on CIFAR-N.

Table 6 presents the results obtained using SOS on the CIFAR-N dataset, with Figs 22 and 23 illustrating a comparative analysis between SOS and UNICON on this dataset. CIFAR-N, a real-world noisy dataset compiled via crowdsourcing platforms, offers a more realistic assessment of the methods’ effectiveness and robustness. Unlike most methods that report only the best test accuracies on this dataset, we provide both the highest accuracies throughout the training process and the average test accuracies over the final 10 epochs, encompassing various noise rates and types. In addition, we have included results for UNICON obtained by running its publicly available code. Consistently, SOS outperforms almost all other SOTA baselines, particularly under severe label noise scenarios, such as “worst” and “noisy.” It is noteworthy that many SOTA methods adjust their hyperparameter settings for the seven noise types, whereas we maintained almost identical settings to those used for CIFAR-10/100. Despite this, our method demonstrates superior performance. For instance, under the “worst” label noise condition (CIFAR-10N), SOS surpasses co-teaching, DivideMix, ELR+, JoCoR+, UNICON, ILL, and PLS by 11%, 2.4%, 3.8%, 0.6%, 1.4%, and 1.2%, respectively. In the more challenging “noisy” scenario on CIFAR-100N, SOS still leads these methods by margins of 13%, 2.1%, 14%, 1.6%, 5.2%, and 0.1%. These results across various experiments on this dataset demonstrate that SOS, with its robust representation learning from unlabeled samples, consistently outperforms other SOTA methods. Furthermore, as shown in Figs 22 and 23, SOS converges faster and achieves higher accuracy than current SOTA methods.

Download:

Fig 22. Comparison of test accuracy on CIFAR-10N.

https://doi.org/10.1371/journal.pone.0309841.g022

Download:

Fig 23. The comparison on CIFAR-100N.

https://doi.org/10.1371/journal.pone.0309841.g023

Download:

Table 6. Results on CIFAR-N.

https://doi.org/10.1371/journal.pone.0309841.t006

Results on WebVision.

Table 7 details the highest test accuracy achieved on WebVision throughout the training process. In line with prior research, we report the test accuracies for the validation set in WebVision and the ILSVRC2012 validation set. In addition, we include both top-1 and top-5 accuracies for these two validation sets. Although SOS is somewhat less effective than LongReMix on WebVision, it is important to note that LongReMix utilizes a two-stage sample selection method, which requires twice the training time of SOS. This difference in training time becomes more significant in large-scale datasets. Although TCL achieved the highest performance on WebVision, surpassing our method by 1.1%, our method outperformed TCL on ILSVRC12 with a margin of 1.44%. Thus, overall, the method presented in this study demonstrates greater stability and consistently better performance across both validation sets, which is a significant advantage in practical applications.

Download:

Table 7. Results on WebVision using pre-trained ResNet-50.

https://doi.org/10.1371/journal.pone.0309841.t007

Results on Clothing1M.

Table 8 displays the highest test accuracy achieved on the Clothing1M dataset over all training epochs. Although TCL and UNICON show a slight performance edge over SOS, it is important to consider the differences in the running environments and the substantial scale of Clothing1M, which comprises one million images. Given these factors, SOS, TCL, and UNICON exhibit comparable performance on this dataset. We replicated UNICON using its publicly available code and maintained the same hyperparameter settings as those used in our method. The results presented in Table 8 are 1.0% lower than UNICON’s originally reported results, suggesting that these three methods are essentially on par in terms of performance. Furthermore, SOS demonstrates competitiveness with, or superiority over, other LNL methods from 2023, such as LongReMix, OT-Filter, DISC, and TCL.

Download:

Table 8. Results on Clothing1M using pre-trained ResNet-50.

https://doi.org/10.1371/journal.pone.0309841.t008

Ablation results

As shown in Tables 2–4, we utilize nearly identical hyperparameter settings for different noise scenarios on the CIFAR datasets. However, existing LNL methods such as DivideMix, LongReMix, and DISC (which employ the same SSL technique as our method) typically require real-time adjustments of multiple hyperparameters, including thresholds and loss weights, based on noise type and noise ratio. In contrast, our method utilizes nearly the same hyperparameter settings across various noise scenarios for this dataset, which sufficiently demonstrates the robustness of our approach. Additionally, our method is inspired by UNICON, which has already shown that DNNs are insensitive to the hyperparameters listed in Table 2 when using uniform selection approach and SSL technique. Therefore, sensitivity analysis of hyperparameters to noise information is not repeated here, please refer to Section 4.3 of [34] and Section 11 of [35] for further analyses. Moreover, SOS consistently applies the same hyperparameter settings across all datasets, except for λ_u, and the effects of λ_u have been analyzed in previous works [34, 35, 41], revealing that SOS exhibits relative insensitivity to variations in hyperparameter settings. Consequently, our analysis focuses primarily on the effects of the SOS, comparing scenarios with (w/) and without (w/o) this strategy on test accuracy, specifically using the CIFAR-10 and CIFAR-N datasets. Table 9 presents the results of the ablation study, highlighting only the best outcomes observed throughout all epochs. The results indicate that the SOS enhances model performance by leveraging the feature representations in label-free samples more effectively. This finding supports the utility of noisy samples in improving the classification performance and robustness of DNNs.

Download:

Table 9. Ablation results of SOS.

https://doi.org/10.1371/journal.pone.0309841.t009

Discussion

Efficiency

Table 10 shows a comparison of training times between our method and UNICON, and DivideMix in CIFAR-10 scenarios with 50% symmetric noise and 40% asymmetric noise. All experiments are conducted on a server equipped with a single 4090 GPU. We set the “num_worker” of the dataloader to 0 in all experiments. As clearly indicated in the table, although the introduction of the oversampling technique adds some training time, the overhead is not significant. Since we only need to train once, the testing phase time is identical, and we achieve better performance. Therefore, we consider the additional overhead to be acceptable.

Download:

Table 10. The training time cost (hours, i.e., “h”) on CIFAR-10 with 50% symmetric noise.

https://doi.org/10.1371/journal.pone.0309841.t010

Pros

Our extensive experimental comparisons, encompassing both synthetic and real-world noisy datasets, have demonstrated the effectiveness of SOS. Our experiments on CIFAR-10/100 show that SOS offers several advantages over current methods such as DISC, LongReMix, and UNICON, particularly in practical applications. These advantages include robustness to hyperparameter variations, faster convergence, and superior performance under high-noise conditions. The results from three real-world noisy datasets further corroborate these advantages. In addition, we have validated the effectiveness of the SOS through comprehensive ablation studies on both synthetic and real-world datasets. We have established that the label-free samples, identified post-detection, significantly enhance model performance, an aspect previously overlooked in earlier studies.

Limitation

Despite the aforementioned advantages of our method, limitations are inevitable. Following previous research, we discuss the limitations of our method from the perspectives of robustness [41], generalization [69], and trustworthiness [70]. Firstly, from the experiments on synthetic noise datasets, it can be observed that our method exhibits excellent robustness to most noise scenarios. However, in high-noise scenarios with a large number of classes, its robustness still has shortcomings. For example, on the CIFAR-100 with 40% asymmetric noise and 90% symmetric noise, our method significantly lags behind the SOTA methods TCL and DISC, respectively (e.g., 52.0% vs 54.5%, and 74.9% vs 76.5%). This is mainly because, under such extreme noise conditions with many classes, the number of noisy samples is very close to the number of clean samples. For instance, in the 40% asymmetric noise scenario, the apple class has 500 samples, with at least 40% of them flipped to mushrooms. In this case, the number of clean samples in the apple class should be less than 300, and the number of samples labeled as mushrooms should be more than 200. Under these circumstances, our balanced partitioning mechanism faces a significant challenge. However, in CIFAR-10, since the number of samples per class increases to 5000, the difference between clean and noisy samples increases, resulting in better performance. In other words, we can mitigate this issue by increasing the number of samples per class. Additionally, the experimental results on WebVision reveals that, although our method reduces the performance gap on WebVision and ILSVRC12 compared with existing methods such as TCL and LongReMix, a considerable difference remains (i.e., 1.3%), indicating that our method’s generalization is insufficient. Finally, in the 40% asymmetric noise scenario of CIFAR-100, the close proximity of noisy and clean samples leads to suboptimal performance of our sample partitioning strategy, indicating poor trustworthiness in this scenarios. Therefore, improving the model’s confidence in noisy samples is a worthwhile focus for future work.

Adaptability

As illustrated above, one significant advantage of our method is that it achieves better performance while being insensitive to hyperparameters, thus eliminating the need for prior noise information and enhancing its usability. Additionally, as demonstrated by our experiments on several benchmark datasets, our method consistently achieves similar performance across different datasets, indicating its strong generalization capability. Therefore, we believe that our method can be applied to a variety of noise scenarios, e.g., symmetric, asymmetric, and mixed (i.e., real-world [40]) noisy datasets.

Conclusion

In this study, we propose an advanced SOS designed for LNL. Our research has identified a gap in current SOS methods: their inability to fully harness the potential of representations in label-free samples, which is crucial for boosting model robustness and performance. SOS addresses this issue by integrating an oversampling strategy with SOTA SSL methods. Through comprehensive experiments, we have established that SOS exhibits a low sensitivity to hyperparameter variations and consistently delivers optimal or near-optimal outcomes across a diverse range of datasets.

References

1. Zhang Y, Tang Q. Accelerating autonomy: an integrated perception digital platform for next generation self-driving cars using faster R-CNN and DeepLabV3. Soft Computing. 2024; 28: 1633–1652. https://doi.org/10.1007/s00500-023-09510-0.
- View Article
- Google Scholar
2. Yang Y, Zhang C B, Song X, Dong Z, Zhu H. S, Li W. J. Contextualized knowledge graph embedding for explainable talent training course recommendation. ACM Transaction on Information Systems. 2023; 42(2): 1–27. https://doi.org/10.1145/3597022.
- View Article
- Google Scholar
3. Devi K. J, Sudha S. V. A novel panoptic segmentation model for lung tumor prediction using deep learning approaches. Soft Computing. 2024; 28: 2637–2648. https://doi.org/10.1007/s00500-023-09569-9.
- View Article
- Google Scholar
4. Yang Y, Wei H, Zhu H, Yu D, Xiong H, Yang J. Exploiting Cross-Modal Prediction and Relation Consistency for Semisupervised Image Captioning. IEEE Transaction on Cybernetics. 2024, 54(2): 890–902. pmid:35895659
- View Article
- PubMed/NCBI
- Google Scholar
5. Kollem S. A fast computational technique based on a novel tangent sigmoid anisotropic diffusion function for image-denoising. Soft Computing. 2024. https://doi.org/10.1007/s00500-024-09628-9.
- View Article
- Google Scholar
6. Yang Y, Yang J. Q, Zhan D. C, Zhu H. S, Gao X. R, et al. Corporate Relative Valuation Using Heterogeneous Multi-Modal Graph Neural Network. IEEE Transaction on Knowledge and Data Engineering. 2023; 35(1): 211–224. https://doi.org/10.1109/TKDE.2021.3080293.
- View Article
- Google Scholar
7. Li Y, Yang J, Wen J. Entropy-based redundancy analysis and information screening. Digital Communications and Networks. 2023; 9(5): 1061–1069. https://doi.org/10.1016/j.dcan.2021.12.001.
- View Article
- Google Scholar
8. Chao X, Li Y. Semisupervised few-shot remote sensing image classification based on knn distance entropy. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 2022; 15: 8798–8805. https://doi.org/10.1109/JSTARS.2022.3213749.
- View Article
- Google Scholar
9. Li Y, Yang J, Zhang Z, Wen J, Kumar P. Healthcare data quality assessment for cybersecurity intelligence. IEEE Transactions on Industrial Informatics. 2023; 19(1): 841–848. https://doi.org/10.1109/TII.2022.3190405.
- View Article
- Google Scholar
10. Li Y.; Ercisli S. Explainable human-in-the-loop healthcare image information quality assessment and selection. CAAI Transaction on Intelligence Technology. 2023.
- View Article
- Google Scholar
11. Patrini G, Rozza A, Menon A, Nock R, Qu L. Making neural networks robust to label noise: A loss correction approach. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2017; 2233–2242. https://doi.org/10.1109/CVPR.2017.240.
- View Article
- Google Scholar
12. Arazo E, Ortego D, Albert P, O’Connor N, Mcguinness K. Unsupervised label noise modeling and loss correction. Proceedings of International Conference on Machine Learning. 2019; 48: 1125–1234.
- View Article
- Google Scholar
13. Xiao T, Xia T, Yang Y, Huang C, Wang X. Learning from massive noisy labeled data for image classification. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015; 2691–2699. https://doi.org/10.1109/CVPR.2015.7298885.
- View Article
- Google Scholar
14. Goldberger J, Reuven E. B. Training deep neural networks using a noise adaptation layer. Proceedings of International Conference Learning Representation. 2017.
- View Article
- Google Scholar
15. Han B, Yao J, Niu G, Zhou M, Tsang I, Zhang Y, et al. Masking: A new perspective of noisy supervision. Advances in Neural Information Processing Systems. 2018; 5836–5846.
- View Article
- Google Scholar
16. Yao J, Wang J, Tsang I, Zhang Y, Sun J, Zhang C, et al. Deep learning from noisy image labels with quality embedding. IEEE Transaction on Image Processing. 2018; 28(4): 1909–1922. pmid:30369444
- View Article
- PubMed/NCBI
- Google Scholar
17. Cheng L, Zhou X, Zhao L, Li D, Shang H, Zheng Y, et al. Weakly supervised learning with side information for noisy labeled images. Proceedings of European Conference on Computer Vision. 2020; 306–321.
- View Article
- Google Scholar
18. Song H, Kim M, Lee J. G. Selfie: Refurbishing unclean samples for robust deep learning. Proceedings of International Conference on Machine Learning. 2019; 5907–5915.
- View Article
- Google Scholar
19. Tu Y, Zhang B, Li Y, Liu L, Li J, Wang Y, et al. Learning from noisy labels with decoupled meta label purifier. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2023; 19934–19943. https://doi.org/10.1109/CVPR52729.2023.01909.
- View Article
- Google Scholar
20. Wei Q, Feng L, Sun H, Wang R, Guo C, Yin Y. Fine-grained classification with noisy labels. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2023; 11651–11660. https://doi.org/10.1109/CVPR52729.2023.01121.
- View Article
- Google Scholar
21. Zhang Q, Lee F, Wang Y, Ding D, Yao W, Chen L. An joint end-to-end framework for learning with noisy labels. Applied Soft Computing. 2021; 108. https://doi.org/10.1016/j.asoc.2021.107426.
- View Article
- Google Scholar
22. Liu Y. D, He W. B. SELC: Self-ensemble label correction improves learning with noisy labels. Proceedings of the International Joint Conference on Artificial Intelligence. 2022.
- View Article
- Google Scholar
23. Yi K, Wu J. Probabilistic end-to-end noise correction for learning with noisy labels. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2019; 7010–7018. https://doi.org/10.1109/CVPR.2019.00718.
- View Article
- Google Scholar
24. Li J, Li G, Liu F, Yu Y. Neighborhood collective estimation for noisy label identification and correction. Proceedings of European Conference on Computer Vision. 2022; 13684. https://doi.org/10.1007/978-3-031-20053-3_8.
- View Article
- Google Scholar
25. Zhang Z. L, Sabuncu M. Generalized cross entropy loss for training deep neural networks with noisy labels. Advances in Neural Information Processing Systems. 2018; 8778–8788.
- View Article
- Google Scholar
26. Lyu Y, Tsang I. Curriculum loss: Robust learning and generalization against label corruption. Proceedings of International Conference Learning Representation. 2020.
- View Article
- Google Scholar
27. Zhou X, Liu X, Jiang J, Gao X, Ji X. Asymmetric loss functions for learning with noisy labels. Proceedings of International Conference on Machine Learning. 2021.
- View Article
- Google Scholar
28. Wang Y, Ma X, Chen Z, Luo Y, Yi J, Bailey J. Symmetric cross entropy for robust learning with noisy labels. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019; 322–330. https://doi.org/10.1109/ICCV.2019.00041.
- View Article
- Google Scholar
29. Han B, Yao Q. M, Yu X. R, Niu G, Xu M, Tsang T, et al. Co-teaching: Robust training of deep neural networks with extremely noisy labels. Advances in Neural Information Processing Systems. 2018; 31.
- View Article
- Google Scholar
30. Zhao G, Li G, Qin Y, Liu F, Yu Y. Centrality and consistency: Two-stage clean samples identification for learning with instance-dependent noisy labels. Proceedings of European Conference on Computer Vision. 2022; 13685. https://doi.org/10.1007/978-3-031-19806-9_2.
- View Article
- Google Scholar
31. Ortego D, Arazo E, Albert P, O’Connor N. E, McGuinness K. Multi-objective interpolation training for robustness to label noise. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2021; 6602–6611. https://doi.org/10.1109/CVPR46437.2021.00654.
- View Article
- Google Scholar
32. Xia Q. Q, Lee F. F, Chen Q. TCC-net: A two-stage training method with contradictory loss and co-teaching based on meta-learning for learning with noisy labels. Information Sciences. 2023; 639: 119008. https://doi.org/10.1016/j.ins.2023.119008.
- View Article
- Google Scholar
33. Wei H. X, Feng L, Chen X. Y, B An. Combating noisy labels by agreement: A joint training method with co-regularization. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2020. https://doi.org/10.1109/CVPR42600.2020.01374.
- View Article
- Google Scholar
34. Li J, Socher R, Hoi S. DivideMix: Learning with noisy labels as semi-supervised learning. Proceedings of International Conference Learning Representation. 2020.
- View Article
- Google Scholar
35. Karim N, Rizve M N, Rahnavard N, Mian A, Shah M. UNICON: Combating label noise through uniform selection and contrastive learning. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2022; 9666–9676. https://doi.org/10.1109/CVPR52688.2022.00945.
- View Article
- Google Scholar
36. Feng C. W, Ren Y. L, Xi X. K. OT-Filter: An optimal transport filter for learning with noisy labels. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2023; 16164–16174. https://doi.org/10.1109/CVPR52729.2023.01551.
- View Article
- Google Scholar
37. Li S. K, Xia X. B, Ge S. M, Liu T. L. Selective-supervised contrastive learning with noisy labels. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2022; 316–325. https://doi.org/10.1109/CVPR52688.2022.00041.
- View Article
- Google Scholar
38. Yao Y. Z, Sun Z. R, Zhang C. Y, Shen F. M, Wu Q, Zhang J, et al. Jo-SRC: A contrastive approach for combating noisy labels. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2021; 5188–5197. https://doi.org/10.1109/CVPR46437.2021.00515.
- View Article
- Google Scholar
39. Huang Z. Z, Zhang J. P, Shan H. M. Twin contrastive learning with noisy labels. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2023; 11661–11670. https://doi.org/10.1109/CVPR52729.2023.01122.
- View Article
- Google Scholar
40. Li Y, Han H, Shan S, Chen X. DISC: Learning from noisy labels via dynamic instance-specific selection and correction. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2023; 24070–24079 https://doi.org/10.1109/CVPR52729.2023.02305.
- View Article
- Google Scholar
41. Cordeiro F. R, Sachdeva R, Belagiannis V, Reid I, Carneiro G. LongReMix: Robust learning with high confidence samples in a noisy label environment. Pattern Recognition. 2023; 133: 109013. https://doi.org/10.1016/j.patcog.2022.109013.
- View Article
- Google Scholar
42. Li H, Wei T, Yang H, Hu K, Peng C, Sun L, et al. Stochastic feature averaging for learning with long tailed noisy labels. Proceedings of the International Joint Conference on Artificial Intelligence. 2023; 3902–3910. https://doi.org/10.24963/ijcai.2023/434.
- View Article
- Google Scholar
43. Song H, Kim M, Park D, Shin Y, Lee J. G. Learning from noisy labels with deep neural networks: A survey. IEEE Transaction on Neural Networks and Learning Systems. 2022; 34(11): 8135–8153. https://doi.org/10.1109/TNNLS.2022.3152527.
- View Article
- Google Scholar
44. Gui X. J, Wang W, Tian Z. H. Towards understanding deep learning from noisy labels with small-loss criterion. Proceedings of the International Joint Conference on Artificial Intelligence. 2021; 2469–2475. https://doi.org/10.24963/ijcai.2021/340.
- View Article
- Google Scholar
45. Berthelot D, Carlini N, Goodfellow I. J, Papernot N, Oliver A, Raffel C. Mixmatch: A holistic approach to semi-supervised learning. Advances in Neural Information Processing Systems. 2019; 5049–5059.
- View Article
- Google Scholar
46. Sohn K, Berthelot D, Li C, Zhang Z, Carlini N, Cubuk E. D, et al. FixMatch: Simplifying semi-supervised learning with consistency and confidence. Advances in Neural Information Processing Systems. 2020.
- View Article
- Google Scholar
47. Wu T, Liu Z, Huang Q, Wang Y, Lin D. Adversarial robustness under long-tailed distribution Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2021; 8655–8664. https://doi.org/10.1109/CVPR46437.2021.00855.
- View Article
- Google Scholar
48. Pang S, Wang W, Zhang R, Hao W. Hierarchical block aggregation network for long-tailed visual recognition. Neurocomputing. 2024; 549. https://doi.org/10.1016/j.neucom.2023.126463.
- View Article
- Google Scholar
49. Zhang Y, Kang B, Hooi B, Yan S. Deep long-tailed learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2023; 45(9): 10795–10816. pmid:37074896
- View Article
- PubMed/NCBI
- Google Scholar
50. Ghosh A, Kumar H, Sastry P. Robust loss functions under label noise for deep neural networks. Proceedings of the Association for the Advancement of Artificial Intelligence. 2017; 1919–1925.
- View Article
- Google Scholar
51. Zhang Q, Zhu Y, Yang M, Jin G, Zhu Y W, Chen Q. Cross-to-merge training with class balance strategy for learning with noisy labels. Expert Systems with Applications. 2024; 249: 123846. https://doi.org/10.1016/j.eswa.2024.123846.
- View Article
- Google Scholar
52. Chen T, Kornblith S, Norouzi M, Hinton G. A simple framework for contrastive learning of visual representations. Proceedings of International Conference on Machine Learning. 2020.
- View Article
- Google Scholar
53. Zhang H, Cisse M, Dauphin Y. N, Lopez-Paz D. Mixup: Beyond empirical risk minimization. Proceedings of International Conference Learning Representation. 2018.
- View Article
- Google Scholar
54. Zhang Q, Jin G, Zhu Y, Wei H, Chen Q. BPT-PLR: A Balanced Partitioning and Training Framework with Pseudo-Label Relaxed Contrastive Loss for Noisy Label Learning. Entropy. 2024; 26: 589. pmid:39056952
- View Article
- PubMed/NCBI
- Google Scholar
55. Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto, 2009. URL: https://www.cs.toronto.edu/∼kriz/learning-features-2009-TR.pdf.
- View Article
- Google Scholar
56. Wei J, Zhu Z, Cheng H, Liu T, Niu G, Liu Y. Learning with noisy labels revisited: A study using real-world human annotations. Proceedings of International Conference Learning Representation. 2022.
- View Article
- Google Scholar
57. Li W, Wang L, Li W, Agustsson E, Gool L. Webvision database: Visual learning and understanding from web data. arXiv 2024; arXiv:1708.02862. https://doi.org/10.48550/arXiv.1708.02862.
- View Article
- Google Scholar
58. Xiao T, Xia T, Yang Y, Huang C, Wang X. Learning from massive noisy labeled data for image classification. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015; 2691–2699. https://doi.org/10.1109/CVPR.2015.7298885.
- View Article
- Google Scholar
59. Cubuk E. D, Zoph B, Mane D, Vasudevan V, Le Q. V. Autoaugment: Learning augmentation policies from data. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2019; 113–123.
- View Article
- Google Scholar
60. Yu X, Han B, Yao J, Niu G, Tsang I, Sugiyama M. How does disagreement help generalization against label corruption. Proceedings of International Conference on Machine Learning. 2019; 7164–7173.
- View Article
- Google Scholar
61. Ghosh A, Lan A. Contrastive learning improves model robustness under label noise. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2021; 2697–2702. https://doi.org/10.1109/CVPRW53098.2021.00304.
- View Article
- Google Scholar
62. Liu S, Niles-Weed J, Razavian N, Fernandez-Granda C. Early-learning regularization prevents memorization of noisy labels. Advances in Neural Information Processing Systems. 2020.
- View Article
- Google Scholar
63. Liu Y, Guo H. Y. Peer loss functions: Learning from noisy labels without knowing noise rates. Proceedings of International Conference Learning Representation. 2020.
- View Article
- Google Scholar
64. Li X. F, Liu T. L, Han B, Niu G, Sugiyama M. Provably end-to-end label-noise learning without anchor points. Proceedings of International Conference on Machine Learning. 2021; 6403–6413.
- View Article
- Google Scholar
65. Chen H, Shah A, Wang J, Tao R, Wang Y, Xie X, et al. Imprecise label learning: A unified framework for learning with various imprecise label configurations. arXiv 2023; arXiv: 2305.12715. https://doi.org/10.48550/arXiv.2305.12715.
- View Article
- Google Scholar
66. Albert P, Arazo E, Krishna T, O’Connor N, McGuinness K. Is your noise correction noisy? PLS: Robustness to label noise with two stage detection. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2023; 118–127. https://doi.org/10.1109/WACV56688.2023.00020.
- View Article
- Google Scholar
67. Liu S, Zhu Z, Qu Q, You C. Robust training under label noise by over-parameterization. Proceedings of International Conference on Machine Learning. 2022; 14153–14172.
- View Article
- Google Scholar
68. Li J, Xiong C, Hoi S. Learning from noisy data with robust representation learning. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021; 9465–9474. https://doi.org/10.1109/ICCV48922.2021.00935.
- View Article
- Google Scholar
69. Yin N, Shen L, Wang M, Luo X, Luo Z, Tao D. OMG: Towards effective graph classification against label noise. IEEE Transactions on Knowledge and Data Engineering. 2023; 35(12): 12873–12886. https://doi.org/10.1109/TKDE.2023.3271677.
- View Article
- Google Scholar
70. Zhang H, Wu B, Yuan X, Pan S, Tong H, Pei J. Trustworthy graph neural networks: aspects, methods, and trends. Proceedings of the IEEE. 2024; 112(2): 97–139. https://doi.org/10.1109/JPROC.2024.3369017.
- View Article
- Google Scholar

[ref1] 1. Zhang Y, Tang Q. Accelerating autonomy: an integrated perception digital platform for next generation self-driving cars using faster R-CNN and DeepLabV3. Soft Computing. 2024; 28: 1633–1652. https://doi.org/10.1007/s00500-023-09510-0.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Yang Y, Zhang C B, Song X, Dong Z, Zhu H. S, Li W. J. Contextualized knowledge graph embedding for explainable talent training course recommendation. ACM Transaction on Information Systems. 2023; 42(2): 1–27. https://doi.org/10.1145/3597022.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Devi K. J, Sudha S. V. A novel panoptic segmentation model for lung tumor prediction using deep learning approaches. Soft Computing. 2024; 28: 2637–2648. https://doi.org/10.1007/s00500-023-09569-9.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Yang Y, Wei H, Zhu H, Yu D, Xiong H, Yang J. Exploiting Cross-Modal Prediction and Relation Consistency for Semisupervised Image Captioning. IEEE Transaction on Cybernetics. 2024, 54(2): 890–902. pmid:35895659
View Article
PubMed/NCBI
Google Scholar

[11] View Article

[12] PubMed/NCBI

[13] Google Scholar

[ref5] 5. Kollem S. A fast computational technique based on a novel tangent sigmoid anisotropic diffusion function for image-denoising. Soft Computing. 2024. https://doi.org/10.1007/s00500-024-09628-9.
View Article
Google Scholar

[15] View Article

[16] Google Scholar

[ref6] 6. Yang Y, Yang J. Q, Zhan D. C, Zhu H. S, Gao X. R, et al. Corporate Relative Valuation Using Heterogeneous Multi-Modal Graph Neural Network. IEEE Transaction on Knowledge and Data Engineering. 2023; 35(1): 211–224. https://doi.org/10.1109/TKDE.2021.3080293.
View Article
Google Scholar

[18] View Article

[19] Google Scholar

[ref7] 7. Li Y, Yang J, Wen J. Entropy-based redundancy analysis and information screening. Digital Communications and Networks. 2023; 9(5): 1061–1069. https://doi.org/10.1016/j.dcan.2021.12.001.
View Article
Google Scholar

[21] View Article

[22] Google Scholar

[ref8] 8. Chao X, Li Y. Semisupervised few-shot remote sensing image classification based on knn distance entropy. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 2022; 15: 8798–8805. https://doi.org/10.1109/JSTARS.2022.3213749.
View Article
Google Scholar

[24] View Article

[25] Google Scholar

[ref9] 9. Li Y, Yang J, Zhang Z, Wen J, Kumar P. Healthcare data quality assessment for cybersecurity intelligence. IEEE Transactions on Industrial Informatics. 2023; 19(1): 841–848. https://doi.org/10.1109/TII.2022.3190405.
View Article
Google Scholar

[27] View Article

[28] Google Scholar

[ref10] 10. Li Y.; Ercisli S. Explainable human-in-the-loop healthcare image information quality assessment and selection. CAAI Transaction on Intelligence Technology. 2023.
View Article
Google Scholar

[30] View Article

[31] Google Scholar

[ref11] 11. Patrini G, Rozza A, Menon A, Nock R, Qu L. Making neural networks robust to label noise: A loss correction approach. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2017; 2233–2242. https://doi.org/10.1109/CVPR.2017.240.
View Article
Google Scholar

[33] View Article

[34] Google Scholar

[ref12] 12. Arazo E, Ortego D, Albert P, O’Connor N, Mcguinness K. Unsupervised label noise modeling and loss correction. Proceedings of International Conference on Machine Learning. 2019; 48: 1125–1234.
View Article
Google Scholar

[36] View Article

[37] Google Scholar

[ref13] 13. Xiao T, Xia T, Yang Y, Huang C, Wang X. Learning from massive noisy labeled data for image classification. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015; 2691–2699. https://doi.org/10.1109/CVPR.2015.7298885.
View Article
Google Scholar

[39] View Article

[40] Google Scholar

[ref14] 14. Goldberger J, Reuven E. B. Training deep neural networks using a noise adaptation layer. Proceedings of International Conference Learning Representation. 2017.
View Article
Google Scholar

[42] View Article

[43] Google Scholar

[ref15] 15. Han B, Yao J, Niu G, Zhou M, Tsang I, Zhang Y, et al. Masking: A new perspective of noisy supervision. Advances in Neural Information Processing Systems. 2018; 5836–5846.
View Article
Google Scholar

[45] View Article

[46] Google Scholar

[ref16] 16. Yao J, Wang J, Tsang I, Zhang Y, Sun J, Zhang C, et al. Deep learning from noisy image labels with quality embedding. IEEE Transaction on Image Processing. 2018; 28(4): 1909–1922. pmid:30369444
View Article
PubMed/NCBI
Google Scholar

[48] View Article

[49] PubMed/NCBI

[50] Google Scholar

[ref17] 17. Cheng L, Zhou X, Zhao L, Li D, Shang H, Zheng Y, et al. Weakly supervised learning with side information for noisy labeled images. Proceedings of European Conference on Computer Vision. 2020; 306–321.
View Article
Google Scholar

[52] View Article

[53] Google Scholar

[ref18] 18. Song H, Kim M, Lee J. G. Selfie: Refurbishing unclean samples for robust deep learning. Proceedings of International Conference on Machine Learning. 2019; 5907–5915.
View Article
Google Scholar

[55] View Article

[56] Google Scholar

[ref19] 19. Tu Y, Zhang B, Li Y, Liu L, Li J, Wang Y, et al. Learning from noisy labels with decoupled meta label purifier. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2023; 19934–19943. https://doi.org/10.1109/CVPR52729.2023.01909.
View Article
Google Scholar

[58] View Article

[59] Google Scholar

[ref20] 20. Wei Q, Feng L, Sun H, Wang R, Guo C, Yin Y. Fine-grained classification with noisy labels. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2023; 11651–11660. https://doi.org/10.1109/CVPR52729.2023.01121.
View Article
Google Scholar

[61] View Article

[62] Google Scholar

[ref21] 21. Zhang Q, Lee F, Wang Y, Ding D, Yao W, Chen L. An joint end-to-end framework for learning with noisy labels. Applied Soft Computing. 2021; 108. https://doi.org/10.1016/j.asoc.2021.107426.
View Article
Google Scholar

[64] View Article

[65] Google Scholar

[ref22] 22. Liu Y. D, He W. B. SELC: Self-ensemble label correction improves learning with noisy labels. Proceedings of the International Joint Conference on Artificial Intelligence. 2022.
View Article
Google Scholar

[67] View Article

[68] Google Scholar

[ref23] 23. Yi K, Wu J. Probabilistic end-to-end noise correction for learning with noisy labels. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2019; 7010–7018. https://doi.org/10.1109/CVPR.2019.00718.
View Article
Google Scholar

[70] View Article

[71] Google Scholar

[ref24] 24. Li J, Li G, Liu F, Yu Y. Neighborhood collective estimation for noisy label identification and correction. Proceedings of European Conference on Computer Vision. 2022; 13684. https://doi.org/10.1007/978-3-031-20053-3_8.
View Article
Google Scholar

[73] View Article

[74] Google Scholar

[ref25] 25. Zhang Z. L, Sabuncu M. Generalized cross entropy loss for training deep neural networks with noisy labels. Advances in Neural Information Processing Systems. 2018; 8778–8788.
View Article
Google Scholar

[76] View Article

[77] Google Scholar

[ref26] 26. Lyu Y, Tsang I. Curriculum loss: Robust learning and generalization against label corruption. Proceedings of International Conference Learning Representation. 2020.
View Article
Google Scholar

[79] View Article

[80] Google Scholar

[ref27] 27. Zhou X, Liu X, Jiang J, Gao X, Ji X. Asymmetric loss functions for learning with noisy labels. Proceedings of International Conference on Machine Learning. 2021.
View Article
Google Scholar

[82] View Article

[83] Google Scholar

[ref28] 28. Wang Y, Ma X, Chen Z, Luo Y, Yi J, Bailey J. Symmetric cross entropy for robust learning with noisy labels. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019; 322–330. https://doi.org/10.1109/ICCV.2019.00041.
View Article
Google Scholar

[85] View Article

[86] Google Scholar

[ref29] 29. Han B, Yao Q. M, Yu X. R, Niu G, Xu M, Tsang T, et al. Co-teaching: Robust training of deep neural networks with extremely noisy labels. Advances in Neural Information Processing Systems. 2018; 31.
View Article
Google Scholar

[88] View Article

[89] Google Scholar

[ref30] 30. Zhao G, Li G, Qin Y, Liu F, Yu Y. Centrality and consistency: Two-stage clean samples identification for learning with instance-dependent noisy labels. Proceedings of European Conference on Computer Vision. 2022; 13685. https://doi.org/10.1007/978-3-031-19806-9_2.
View Article
Google Scholar

[91] View Article

[92] Google Scholar

[ref31] 31. Ortego D, Arazo E, Albert P, O’Connor N. E, McGuinness K. Multi-objective interpolation training for robustness to label noise. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2021; 6602–6611. https://doi.org/10.1109/CVPR46437.2021.00654.
View Article
Google Scholar

[94] View Article

[95] Google Scholar

[ref32] 32. Xia Q. Q, Lee F. F, Chen Q. TCC-net: A two-stage training method with contradictory loss and co-teaching based on meta-learning for learning with noisy labels. Information Sciences. 2023; 639: 119008. https://doi.org/10.1016/j.ins.2023.119008.
View Article
Google Scholar

[97] View Article

[98] Google Scholar

[ref33] 33. Wei H. X, Feng L, Chen X. Y, B An. Combating noisy labels by agreement: A joint training method with co-regularization. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2020. https://doi.org/10.1109/CVPR42600.2020.01374.
View Article
Google Scholar

[100] View Article

[101] Google Scholar

[ref34] 34. Li J, Socher R, Hoi S. DivideMix: Learning with noisy labels as semi-supervised learning. Proceedings of International Conference Learning Representation. 2020.
View Article
Google Scholar

[103] View Article

[104] Google Scholar

[ref35] 35. Karim N, Rizve M N, Rahnavard N, Mian A, Shah M. UNICON: Combating label noise through uniform selection and contrastive learning. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2022; 9666–9676. https://doi.org/10.1109/CVPR52688.2022.00945.
View Article
Google Scholar

[106] View Article

[107] Google Scholar

[ref36] 36. Feng C. W, Ren Y. L, Xi X. K. OT-Filter: An optimal transport filter for learning with noisy labels. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2023; 16164–16174. https://doi.org/10.1109/CVPR52729.2023.01551.
View Article
Google Scholar

[109] View Article

[110] Google Scholar

[ref37] 37. Li S. K, Xia X. B, Ge S. M, Liu T. L. Selective-supervised contrastive learning with noisy labels. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2022; 316–325. https://doi.org/10.1109/CVPR52688.2022.00041.
View Article
Google Scholar

[112] View Article

[113] Google Scholar

[ref38] 38. Yao Y. Z, Sun Z. R, Zhang C. Y, Shen F. M, Wu Q, Zhang J, et al. Jo-SRC: A contrastive approach for combating noisy labels. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2021; 5188–5197. https://doi.org/10.1109/CVPR46437.2021.00515.
View Article
Google Scholar

[115] View Article

[116] Google Scholar

[ref39] 39. Huang Z. Z, Zhang J. P, Shan H. M. Twin contrastive learning with noisy labels. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2023; 11661–11670. https://doi.org/10.1109/CVPR52729.2023.01122.
View Article
Google Scholar

[118] View Article

[119] Google Scholar

[ref40] 40. Li Y, Han H, Shan S, Chen X. DISC: Learning from noisy labels via dynamic instance-specific selection and correction. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2023; 24070–24079 https://doi.org/10.1109/CVPR52729.2023.02305.
View Article
Google Scholar

[121] View Article

[122] Google Scholar

[ref41] 41. Cordeiro F. R, Sachdeva R, Belagiannis V, Reid I, Carneiro G. LongReMix: Robust learning with high confidence samples in a noisy label environment. Pattern Recognition. 2023; 133: 109013. https://doi.org/10.1016/j.patcog.2022.109013.
View Article
Google Scholar

[124] View Article

[125] Google Scholar

[ref42] 42. Li H, Wei T, Yang H, Hu K, Peng C, Sun L, et al. Stochastic feature averaging for learning with long tailed noisy labels. Proceedings of the International Joint Conference on Artificial Intelligence. 2023; 3902–3910. https://doi.org/10.24963/ijcai.2023/434.
View Article
Google Scholar

[127] View Article

[128] Google Scholar

[ref43] 43. Song H, Kim M, Park D, Shin Y, Lee J. G. Learning from noisy labels with deep neural networks: A survey. IEEE Transaction on Neural Networks and Learning Systems. 2022; 34(11): 8135–8153. https://doi.org/10.1109/TNNLS.2022.3152527.
View Article
Google Scholar

[130] View Article

[131] Google Scholar

[ref44] 44. Gui X. J, Wang W, Tian Z. H. Towards understanding deep learning from noisy labels with small-loss criterion. Proceedings of the International Joint Conference on Artificial Intelligence. 2021; 2469–2475. https://doi.org/10.24963/ijcai.2021/340.
View Article
Google Scholar

[133] View Article

[134] Google Scholar

[ref45] 45. Berthelot D, Carlini N, Goodfellow I. J, Papernot N, Oliver A, Raffel C. Mixmatch: A holistic approach to semi-supervised learning. Advances in Neural Information Processing Systems. 2019; 5049–5059.
View Article
Google Scholar

[136] View Article

[137] Google Scholar

[ref46] 46. Sohn K, Berthelot D, Li C, Zhang Z, Carlini N, Cubuk E. D, et al. FixMatch: Simplifying semi-supervised learning with consistency and confidence. Advances in Neural Information Processing Systems. 2020.
View Article
Google Scholar

[139] View Article

[140] Google Scholar

[ref47] 47. Wu T, Liu Z, Huang Q, Wang Y, Lin D. Adversarial robustness under long-tailed distribution Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2021; 8655–8664. https://doi.org/10.1109/CVPR46437.2021.00855.
View Article
Google Scholar

[142] View Article

[143] Google Scholar

[ref48] 48. Pang S, Wang W, Zhang R, Hao W. Hierarchical block aggregation network for long-tailed visual recognition. Neurocomputing. 2024; 549. https://doi.org/10.1016/j.neucom.2023.126463.
View Article
Google Scholar

[145] View Article

[146] Google Scholar

[ref49] 49. Zhang Y, Kang B, Hooi B, Yan S. Deep long-tailed learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2023; 45(9): 10795–10816. pmid:37074896
View Article
PubMed/NCBI
Google Scholar

[148] View Article

[149] PubMed/NCBI

[150] Google Scholar

[ref50] 50. Ghosh A, Kumar H, Sastry P. Robust loss functions under label noise for deep neural networks. Proceedings of the Association for the Advancement of Artificial Intelligence. 2017; 1919–1925.
View Article
Google Scholar

[152] View Article

[153] Google Scholar

[ref51] 51. Zhang Q, Zhu Y, Yang M, Jin G, Zhu Y W, Chen Q. Cross-to-merge training with class balance strategy for learning with noisy labels. Expert Systems with Applications. 2024; 249: 123846. https://doi.org/10.1016/j.eswa.2024.123846.
View Article
Google Scholar

[155] View Article

[156] Google Scholar

[ref52] 52. Chen T, Kornblith S, Norouzi M, Hinton G. A simple framework for contrastive learning of visual representations. Proceedings of International Conference on Machine Learning. 2020.
View Article
Google Scholar

[158] View Article

[159] Google Scholar

[ref53] 53. Zhang H, Cisse M, Dauphin Y. N, Lopez-Paz D. Mixup: Beyond empirical risk minimization. Proceedings of International Conference Learning Representation. 2018.
View Article
Google Scholar

[161] View Article

[162] Google Scholar

[ref54] 54. Zhang Q, Jin G, Zhu Y, Wei H, Chen Q. BPT-PLR: A Balanced Partitioning and Training Framework with Pseudo-Label Relaxed Contrastive Loss for Noisy Label Learning. Entropy. 2024; 26: 589. pmid:39056952
View Article
PubMed/NCBI
Google Scholar

[164] View Article

[165] PubMed/NCBI

[166] Google Scholar

[ref55] 55. Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto, 2009. URL: https://www.cs.toronto.edu/∼kriz/learning-features-2009-TR.pdf.
View Article
Google Scholar

[168] View Article

[169] Google Scholar

[ref56] 56. Wei J, Zhu Z, Cheng H, Liu T, Niu G, Liu Y. Learning with noisy labels revisited: A study using real-world human annotations. Proceedings of International Conference Learning Representation. 2022.
View Article
Google Scholar

[171] View Article

[172] Google Scholar

[ref57] 57. Li W, Wang L, Li W, Agustsson E, Gool L. Webvision database: Visual learning and understanding from web data. arXiv 2024; arXiv:1708.02862. https://doi.org/10.48550/arXiv.1708.02862.
View Article
Google Scholar

[174] View Article

[175] Google Scholar

[ref58] 58. Xiao T, Xia T, Yang Y, Huang C, Wang X. Learning from massive noisy labeled data for image classification. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015; 2691–2699. https://doi.org/10.1109/CVPR.2015.7298885.
View Article
Google Scholar

[177] View Article

[178] Google Scholar

[ref59] 59. Cubuk E. D, Zoph B, Mane D, Vasudevan V, Le Q. V. Autoaugment: Learning augmentation policies from data. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2019; 113–123.
View Article
Google Scholar

[180] View Article

[181] Google Scholar

[ref60] 60. Yu X, Han B, Yao J, Niu G, Tsang I, Sugiyama M. How does disagreement help generalization against label corruption. Proceedings of International Conference on Machine Learning. 2019; 7164–7173.
View Article
Google Scholar

[183] View Article

[184] Google Scholar

[ref61] 61. Ghosh A, Lan A. Contrastive learning improves model robustness under label noise. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2021; 2697–2702. https://doi.org/10.1109/CVPRW53098.2021.00304.
View Article
Google Scholar

[186] View Article

[187] Google Scholar

[ref62] 62. Liu S, Niles-Weed J, Razavian N, Fernandez-Granda C. Early-learning regularization prevents memorization of noisy labels. Advances in Neural Information Processing Systems. 2020.
View Article
Google Scholar

[189] View Article

[190] Google Scholar

[ref63] 63. Liu Y, Guo H. Y. Peer loss functions: Learning from noisy labels without knowing noise rates. Proceedings of International Conference Learning Representation. 2020.
View Article
Google Scholar

[192] View Article

[193] Google Scholar

[ref64] 64. Li X. F, Liu T. L, Han B, Niu G, Sugiyama M. Provably end-to-end label-noise learning without anchor points. Proceedings of International Conference on Machine Learning. 2021; 6403–6413.
View Article
Google Scholar

[195] View Article

[196] Google Scholar

[ref65] 65. Chen H, Shah A, Wang J, Tao R, Wang Y, Xie X, et al. Imprecise label learning: A unified framework for learning with various imprecise label configurations. arXiv 2023; arXiv: 2305.12715. https://doi.org/10.48550/arXiv.2305.12715.
View Article
Google Scholar

[198] View Article

[199] Google Scholar

[ref66] 66. Albert P, Arazo E, Krishna T, O’Connor N, McGuinness K. Is your noise correction noisy? PLS: Robustness to label noise with two stage detection. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2023; 118–127. https://doi.org/10.1109/WACV56688.2023.00020.
View Article
Google Scholar

[201] View Article

[202] Google Scholar

[ref67] 67. Liu S, Zhu Z, Qu Q, You C. Robust training under label noise by over-parameterization. Proceedings of International Conference on Machine Learning. 2022; 14153–14172.
View Article
Google Scholar

[204] View Article

[205] Google Scholar

[ref68] 68. Li J, Xiong C, Hoi S. Learning from noisy data with robust representation learning. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021; 9465–9474. https://doi.org/10.1109/ICCV48922.2021.00935.
View Article
Google Scholar

[207] View Article

[208] Google Scholar

[ref69] 69. Yin N, Shen L, Wang M, Luo X, Luo Z, Tao D. OMG: Towards effective graph classification against label noise. IEEE Transactions on Knowledge and Data Engineering. 2023; 35(12): 12873–12886. https://doi.org/10.1109/TKDE.2023.3271677.
View Article
Google Scholar

[210] View Article

[211] Google Scholar

[ref70] 70. Zhang H, Wu B, Yuan X, Pan S, Tong H, Pei J. Trustworthy graph neural networks: aspects, methods, and trends. Proceedings of the IEEE. 2024; 112(2): 97–139. https://doi.org/10.1109/JPROC.2024.3369017.
View Article
Google Scholar

[213] View Article

[214] Google Scholar

Figures

Abstract

Introduction

Related works

Research on sample selection for LNL

Research on long-tailed learning

Methodology

Uniform selection approach

Robust SSL training with oversampling strategy

Proposed SOS framework

Experiments

The details of datasets

The details of CIFAR-10 and CIFAR-100.

The details of CIFAR-N.

The details of WebVision.

The details of Clothing1M.

Experimental results on synthetic datasets

Results on CIFAR-10.

Results on CIFAR-100.

Experimental results on real-world datasets

Results on CIFAR-N.

Results on WebVision.

Results on Clothing1M.

Ablation results

Discussion

Efficiency

Pros

Limitation

Adaptability

Conclusion

References