Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

The student-teacher framework guided by self-training and consistency regularization for semi-supervised medical image segmentation

  • Boliang Li,

    Roles Investigation, Methodology, Software, Visualization, Writing – original draft

    Affiliation Department of control science and engineering, Harbin Institute of Technology, Harbin, Heilongjiang, China

  • Yaming Xu,

    Roles Data curation

    Affiliation Department of control science and engineering, Harbin Institute of Technology, Harbin, Heilongjiang, China

  • Yan Wang ,

    Roles Supervision

    wang_yan_hit@163.com

    Affiliation Department of control science and engineering, Harbin Institute of Technology, Harbin, Heilongjiang, China

  • Luxiu Li,

    Roles Validation, Writing – review & editing

    Affiliation Faculty of Robot Science and Engineering, Northeastern University, Shenyang, Liaoning, China

  • Bo Zhang

    Roles Validation

    Affiliation Sergeant schools of Army Academy of Armored Forces, Changchun, Jilin, China

Abstract

Due to the high suitability of semi-supervised learning for medical image segmentation, a plethora of valuable research has been conducted and has achieved noteworthy success in this field. However, many approaches tend to confine their focus to a singular semi-supervised framework, thereby overlooking the potential enhancements in segmentation performance offered by integrating several frameworks. In this paper, we propose a novel semi-supervised framework named Pesudo-Label Mean Teacher (PLMT), which synergizes the self-training pipeline with pseudo-labeling and consistency regularization techniques. In particular, we integrate the student-teacher structure with consistency loss into the self-training pipeline to facilitate a mutually beneficial enhancement between the two methods. This structure not only generates remarkably accurate pseudo-labels for the self-training pipeline but also furnishes additional pseudo-label supervision for the student-teacher framework. Moreover, to explore the impact of different semi-supervised losses on the segmentation performance of the PLMT framework, we introduce adaptive loss weights. The PLMT could dynamically adjust the weights of different semi-supervised losses during the training process. Extension experiments on three public datasets demonstrate that our framework achieves the best performance and outperforms the other five semi-supervised methods. The PLMT is an initial exploration of the framework that melds the self-training pipeline with consistency regularization and offers a comparatively innovative perspective in semi-supervised image segmentation.

Introduction

The segmentation of medical images is a crucial part of the clinical analysis to aid experts in the diagnosis of diseases and the formulation of treatment plans [1]. Deep learning methods have demonstrated significant achievements in medical image segmentation recently [24]. However, these approaches rely heavily on annotated data, but the acquisition of labels is a complex and time-consuming process, which substantially encumbers the prospective evolution of deep learning methods in this domain [5]. Semi-supervised learning [6] is highly suitable for medical image segmentation tasks since it can effectively extract information from large amounts of unlabelled images. Therefore, excellent approaches to semi-supervised learning are increasingly emerging.

In various semi-supervised learning methods [710], approaches based on self-training and consistency regularization are widely employed due to their simplicity and efficacy. The crucial of self-training is pseudo-labels. This approach first creates pseudo-labels for unlabeled images, then composes an extra training dataset of pseudo-labels and unlabeled images, and finally forces the segmentation model to learn effective information from unlabeled images by pseudo-label supervised loss. Consequently, the precision of pseudo-labels is essential for realizing the best performance in self-training. In the consistency regularization domain, the student-teacher structure is one of the most widely used structures, for example, the mean teacher [9]. It furnishes identical inputs to both student and teacher models but adds additional noise to the inputs of the student model, and supervises the student model using the outputs of the teacher model, thereby enforcing output consistency between the two models and actualizing the low-density separation between different classes in semi-supervised methods.

However, many researchers have focused on developing entirely novel semi-supervised methods or enhancing single existing methods, neglecting the potential benefits derivable from combining several semi-supervised approaches. In this study, we present a novel semi-supervised method called Pseudo Label Mean Teacher (PLMT). It combines the self-training process and the consistency regularisation method. In the PLMT framework, the student-teacher structure based on consistency regularization can yield precise pseudo-labels, whereas the self-training pipeline provides additional pseudo-label supervision for the student-teacher structure. In essence, we amalgamate the two most prevalently employed semi-supervised methods to engender a mutually reinforcing impact.

Furthermore, since the PLMT framework includes pseudo-labeled and consistency losses, the weights of different semi-supervised losses represent the preference of the PLMT framework. It is necessary to consider the impact of different weights of the semi-supervised loss function. In response to this point, we introduce adaptive loss weights, which allow PLMT to dynamically adjust the weights to the optimal values for different tasks during training and thereby achieve the best segmentation accuracy.

An approach similar to ours is the UPC framework [11]. It also leverages both pseudo-labels and consistency regularization. Nevertheless, it diverges from our technique in that it directly utilizes the outputs of the teacher model as pseudo-labels. This strategy could compromise the accuracy of the pseudo-labels, accumulating errors and degrading performance. In contrast, our approach incorporates the student-teacher architecture within the self-training pipeline, thereby facilitating the generation of more precise pseudo-labels. Overall, our main contributions to this paper are summarized as follows:

  • We introduce a novel semi-supervised medical image segmentation framework named PLMT, which integrates consistency regularization with the self-training pipeline.
  • By adaptively adjusting the weights of the two kinds of unsupervised losses, the PLMT framework can take full advantage of the benefits of both pseudo-label and consistency regularization.
  • Experimental results from three datasets demonstrate that our framework can effectively extract task-relevant information from unlabeled samples and outperforms the other five semi-supervised methods.

Related works

Consistency regularization

Consistency regularization is one of the most widely applied methods for semi-supervised learning. The basic principle is that the network presents a consistent output for noisy samples. In other words, tiny perturbations should not alter the classification results of the network for the same inputs.

There are many consistency regularization approaches have been developed. For instance, the temporal ensemble [12] method employs a self-ensembling strategy, enforcing consistency in predictions between two augmentations and the network predictions across previous epochs of the same sample. Tarvainen et al. [9] proposed the student-teacher structure and explored the prediction consistency of models with different parameters. Miyato et al. [13] introduced a virtual adversarial training method that employs adversarial training to establish the consistency constraint between the outputs of unlabeled samples and those with adversarial noise. Additionally, various studies [1417] have also proposed other effective consistency methods. In this paper, we utilize the student-teacher structure as the implementation mechanism of the consistency regularization in the PLMT framework.

Self training

Self-training [10] is also known as pseudo-labeling, which essentially means training a model with little labeled data and generating pseudo-labels for the unlabeled data. Recently, it has been increasingly attracted attention and widely used in deep learning. Bai et al. [18] introduced a method namely semiFCN that performs self-training for medical image segmentation by amalgamating labeled and unlabeled data during the training process. Yang et al. [19] presented the ST++ method for semi-supervised semantic segmentation by employing strong data augmentation on unlabeled data and adjusting the order of usage on the unlabeled data for the reliability of pseudo-labels. Zou et al. [20] improved the quality of the pseudo-labels by fusing pixel-level and image-level pseudo-labels and strong data augmentation. Different from the above methods, we merge the student-teacher structure into the self-training stream to yield more precise pseudo-labels.

Semi-supervised medical image segmentation

Since semi-supervised methods can alleviate the challenge of labeled data, several methods have been applied to medical image segmentation recently. For example, Yu et al. [21] amalgamated the student-teacher structure and uncertainty to execute the left atrium segmentation task. Shi et al. [22] introduced an uncertainty estimation semi-supervised method designed to capture the inconsistent prediction across multiple cost-sensitive settings to diminish prediction uncertainty. Luo et al. [23] explored the dual-task consistency between the segmentation predictions and geometry-aware level-set regression through a dual-task network. Wu et al. [24] proposed MC-Net+, which has multiple decoder outputs, for semi-supervised medical image segmentation by establishing consistency restrictions among the outputs of multiple decoders. Similarly, Luo et al. [25] utilized a pyramid prediction network, learning from the unlabeled data by encouraging multiple scales to yield consistent predictions. Furthermore, other novel semi-supervised approaches [2629] have also demonstrated excellent performance in specific medical image segmentation tasks. However, the aforementioned methods seldom concentrate on the relationship between consistency regularization and self-training. In contrast, the PLMT framework integrates the student-teacher structure into the self-training pipeline, facilitating the extraction of additional valuable representations from unlabeled data.

Method

Problem definition

In this section, we detail the proposed PLMT framework, as illustrated in Fig 1. Before describing our method, we introduce the formula representation of our dataset and network. Due to the rarity of medical image annotation, we target to train the model using a small number of images with annotation and a large number of images without labels to improve the segmentation accuracy on the test dataset. The labeld images and unlabeld images jointly constitute the training dataset, where refers to the ground truth and NM. Acquiring as many task-relevant and efficient representations as possible from unlabelled data is the most critical problem confronted by every semi-supervised medical image segmentation method.

thumbnail
Fig 1. Overview of the proposed PLMT framework.

https://doi.org/10.1371/journal.pone.0300039.g001

As illustrated in Fig 1, the primary architecture employed in the PLMT is a student-teacher structure, where the parameters of the teacher model are updated by the student model using the exponential moving average (EMA) during the training process. To facilitate description, we donate teacher and student networks by and . f(θ*) refers to the network with parameters used for producing the pseudo-labels. The training sample xi is fed into f(θ) to obtain the probability output pi. Similarly, during the generation of pseudo-labels, the unlabeled sample is fed into f(θ*) to yield , and generates the corresponding one-hot pseudo-label .

Framework and pipeline

From a general perspective, the PLMT is an end-to-end semi-supervised framework, but in detail, It comprises three primary stages, similar to the self-training workflow. These stages are: (1) training the baseline network to determine its optimal parameters, (2) leveraging the segmentation model with the optimal parameters to produce pseudo-labels, and (3) re-training the segmentation network from scratch while incorporating the pseudo-label supervised loss. For convenience of description, we denote these stages as Stage A, Stage B, and Stage C, as depicted in Fig 1.

During Stage A of self-training, the model can solely be trained to utilize a small amount of labeled data and cannot utilize the information from numerous unlabelled samples. In contrast, in Stage A of PLMT, we employ the Mean Teacher structure, which can fully exploit the vast amount of unlabeled data, optimizing segmentation model parameters and producing precise pseudo-labels, which are essential for achieving outstanding outcomes of the self-training approach.

Stage B of the PLMT aligns with the pseudo-label generation in the self-training approach. In other words, the segmentation model attained from stage A is used to procure pseudo-labels for unlabeled samples. Additional techniques, such as setting the pseudo-label confidence threshold, are not employed to streamline the process.

In Stage C of the PLMT framework, the segmentation network is trained using a Teacher Student structure, integrating pseudo-labels and consistency regularization. Contrasting with the conventional Mean Teacher structure, PLMT provides additional supervision loss from the pseudo-labels. Furthermore, relative to the self-training method, PLMT supplies consistency regularization loss and more precise pseudo-labels, facilitating the extraction of effective representations from unlabeled samples.

Algorithm 1 The training pipeline of the PLMT framework

Require: labeled samples: , unlabeled samples:

Require: student and teacher model parameters in Stage A: and

Require: student and teacher model parameters in Stage C: and

Require: maximum iterations: iter_max

Require: semi-supervised loss weight: λA and λC

Require: trainable parameters: α, β and temperature factor: K

Ensure: optimized parameters of the student model in Stage A:

Ensure: segmentation network parameters:

 ########### Stage A ###########

1: count ← 0

2: while count < iter_max do

3:  for bA = bl + bu do

4:   ,

5:   ,

6:   

7:   Updating by optimizer and updating by

8:   count ← count+1

9:  end for

10: end while

 ########### Stage B ###########

11: for in DU do

12:  

13:  

14: end for

 ########### Stage C ###########

15: count ← 0

16: while count < iter_max do

17:  for bC = bl + bp do

18:   ,

19:   ,

20:   ,

21:   

22:   Updating by optimizer and updating by

23:   count ← count+1

24:  end for

25: end while

26: return

Loss function

In this subsection, we introduce the loss function in the PLMT framework. Overall, the PLMT optimizes the backbone model in Stage A and Stage C, while Stage B is applied to produce pseudo-labels that do not require a loss function. and are applied to formulate the loss functions required in Stage A and Stage C respectively. The optimization of Stage A resembles that of Mean Teacher. Therefore, The can be described as: (1) where λA is the trade-off weight of supervision and consistency loss.

employs the standard cross-entropy function to calculate the supervision loss between the labeled samples and corresponding ground truth, which is written as: (2) where lce is the cross-entropy loss function. denotes the consistency loss between the outputs of the teacher and student models using unlabeled samples. In this study, we employ the mean squared error function to compute this loss and the can be written as: (3) where lmse refers to the mean squared error loss function.

Fig 1 indicates that, compared to Stage A, there is an extra pseudo-label supervision loss introduced into the optimization of Stage C. Hence the is formulated as follows. (4)

In the , α and β are the trainable parameters and respectively indicate the weights of the different unsupervised losses in Stage C, where α + β = 1 and α, β > 0. Due to the significant magnitude difference between values the and , we introduce a temperature factor K to bridge this gap, ensuring the PLMT framework does not overfit the smaller loss function during training. This measure ensures the intended purpose of multiple semi-supervised losses by avoiding excessive weight allocation to smaller loss values. The loss functions of are the same as the . The refers to the pseudo-label supervised loss of pseudo labels and the outputs of the student model in unlabeled samples. It also adopts the cross-entropy function and is written as: (5)

In a word, the PLMT framework enables to yield of more precise pseudo-labels using the student-teacher model. Additionally, the novel introduced pseudo-label supervision loss augments the performance of this student-teacher architecture. Algo 1 provides an overview of the proposed PLMT approach.

Experiments

Datasets and pre-processing

We evaluated our approach in three public datasets, which are ACDC, LA, and Spleen Datasets.

The ACDC Dataset is a public benchmark dataset of the 2017 Automated Cardiac Diagnosis Challenge [30]. It contains 100 labeled MR samples in total and includes annotations for three classes: left ventricle(LV), right ventricle(RV), and myocardium(MYO). For fair training and inference, 80 subjects are allocated to the training set and the remaining 20 to the testing set.

The LA dataset is the benchmark dataset for the 2018 Atrial Segmentation Challenge [31], containing 100 gadolinium-enhanced MR imaging scans for training, with a resolution of 0.625 × 0.625 × 0.625 mm. Since the testing set on LA does not include public labels, following [21, 24], we use 80 samples as the training set, the rest 20 samples are for testing.

The Spleen dataset is one of the ten tasks of the Medical Segmentation Decathlon Challenge [32]. It is collected from patients who are receiving chemotherapy treatments for liver metastases and acquired in the Memorial Sloan Kettering Cancer Center. The dataset consists of 61 CT scans in total but only 41 have expert annotations. Following [33], 33 samples compose the training set and the remaining 8 samples are used as the testing set.

Table 1 describes the division of the samples of the three datasets in detail, with * referring to labeled samples and Δ to unlabeled samples. Due to varying image sizes in three original datasets, we resize all the 3D scans into 256 × 256 2D slices. Afterward, we performed 2D rotation and flip operations across the three datasets for data augmentation and normalized the samples to zero mean and unit variance.

thumbnail
Table 1. The split of labeled and unlabeled samples in the training and test datasets.

https://doi.org/10.1371/journal.pone.0300039.t001

Implementation details

In this paper, our method implementation utilizes the PyTorch framework, executed on an Intel(R) i7 13700k CPU and an NVIDIA 4090 GPU. During the optimization stage, we employ the SGD optimizer with a weight decay of 0.001 and momentum of 0.9, training for 36,000 iterations. An initial learning rate of 0.1 is adopted, with the “poly” strategy dictating the learning rate decay. The batch size is set at 24, the size of supervised and unsupervised samples at 12 each. The total semi-supervised loss weight λA and λC are set to 0.1. The temperature factor K is set to 1000. Following [9, 21], we apply a time-dependent Gaussian warming-up function to balance the supervised and unsupervised losses, where t represents the current iteration count and tmax denotes the maximum iterations.

In addition, we employ a 2D UNet with initial channels of 16 and four downsampling and upsampling modules as the segmentation backbone network. Mean Teacher [9], Self-training [10], Entropy minimization [34], DCT [35] and UAMT [21] are adopted as the comparison methods.

We employ four widely used metrics to evaluate the segmentation performance of all methods, including the Dice similarity coefficient (Dice), Jaccard Index (Jaccard), 95% Hausdorff Distance (HD95), and Average Surface Distance (ASD). Specifically, Dice and Jaccard measure the similarity between the segmentation output and the ground truth. ASD and HD95 capture the boundary differences between the output and the label.

Results

Performance on the LA dataset

We present the quantitative results of the LA segmentation task in Table 2. This shows the performance of our proposed method and other five comparative methods, alongside the results of a U-Net model trained with 10%, 20%, and all labeled samples as the reference. Table 2 indicates that the PLMT framework outperforms the other five semi-supervised methods across all evaluation metrics. Specifically, compared with the UNet model without any semi-supervised methods, the Dice coefficient of the PLMT increased by 5.06% and 2.57% when trained with only 10% and 20% of the labeled data, respectively. Compared to the best results obtained by other semi-supervised methods, PLMT shows an improvement of 2.14% and 1.44% in the Dice coefficient. Furthermore, when trained with only 20% of the labeled data, the PLMT framework shows a marginal difference of only 0.92% in the Dice coefficient compared to the results obtained from the UNet model with fully labeled data. It demonstrates that the PLMT framework could effectively leverage unlabeled data to extract more efficient representation and significantly enhance performance over other semi-supervised methods.

thumbnail
Table 2. Quantitative comparison results on the LA dataset.

Best results are in bold and suboptimal results are in underlined. * and ** indicate p ≤ 0.05 and p ≤ 0.02 from two-sided paired t-test when comparing the PLMT with other methods, respectively.

https://doi.org/10.1371/journal.pone.0300039.t002

To intuitively express the excellent segmentation performance of the PLMT method, we also provide several visualized examples of our framework and other comparison methods in Fig 2. The red portions indicate the segmentation masks resulting from different methods, and the “label” is derived from the corresponding labels of the samples. Compared with other semi-supervised methods, the segmentation masks produced by the PLMT exhibit a closer alignment with the ground truths. It shows that the PLMT could efficiently separate the regions of interest.

thumbnail
Fig 2. Visual comparison examples on the LA dataset.

https://doi.org/10.1371/journal.pone.0300039.g002

Performance on the Spleen dataset

Similar to the evaluation on the LA dataset, Fig 3 and Table 3 show the corresponding results and visual segmentation examples of the PLMT framework and other comparative methods on the Spleen dataset. It demonstrates that: (1) Relative to the other five semi-supervised methods, our model outperforms in all evaluation metrics, although the ASD is marginally inferior to the self-training framework trained with 10% labeled data. (2) By efficiently leveraging representations from unlabeled data, our model delivers a Dice score improvement of 3.96% and 3.77% over the supervised UNet model trained with 10% and 20% labeled samples, respectively. Compared to the best results obtained by other semi-supervised methods, PLMT shows an improvement of 0.93%(92.88%, DCT with 10% labeled data) and 1.76%(93.81%, DCT with 20% labeled data) in the Dice coefficient. (3) Fig 3 depicts that compared with other segmentation masks, the masks yielded by PLMT enable clear recognition of the target region and exclude erroneous predictions.

thumbnail
Fig 3. Visual comparison examples on the Spleen dataset.

https://doi.org/10.1371/journal.pone.0300039.g003

thumbnail
Table 3. Quantitative comparison results on the Spleen dataset.

Best results are in bold and suboptimal results are in underlined. * and ** indicate p ≤ 0.05 and p ≤ 0.02 from two-sided paired t-test when comparing the PLMT with other methods, respectively.

https://doi.org/10.1371/journal.pone.0300039.t003

Performance on the ACDC dataset

Different from the LA and Spleen binary classification datasets, ACDC is a multi-classification dataset that includes the right ventricle(RV), myocardium(MYO), and left ventricle(LV) components. Table 4 and Fig 4 show the quantitative results and visualization segmentation examples of the PLMT approach and other methods on the ACDC dataset. It can be seen from Table 4 that the PLMT framework obtains the best performance in most of the evaluation metrics. In 10% labeled sample results, the PLMT achieves a Dice gain of 4.75%, 2.80%, and 2.72% than the UNet without any semi-supervised method in RV, MYO, and LV, respectively. And in 20% labeled sample results, the PLMT achieves a Dice gain of 3.91%, 3.35%, and 2.06%, respectively. In all three categories, the PLMT also achieves the highest Dice score compared to the other five semi-supervised methods. It demonstrates that the approach of combining consistency regularization and self-training indeed yields superior segmentation performance than a single semi-supervised method.

thumbnail
Fig 4. Visual comparison examples on the ACDC dataset.

https://doi.org/10.1371/journal.pone.0300039.g004

thumbnail
Table 4. Quantitative comparison results on the ACDC dataset.

Best results are in bold and suboptimal results are in underlined. * and ** indicate p ≤ 0.05 and p ≤ 0.02 from two-sided paired t-test when comparing the PLMT with other methods, respectively.

https://doi.org/10.1371/journal.pone.0300039.t004

In Fig 4, the red, green, and blue portions indicate the segmentation parts of the right ventricle, myocardium, and left ventricle, respectively. These visual examples show that compared with the segmentation results of other methods, our segmentation maps are very fitted to the ground truths, particularly for the segmentation of the right ventricle, and the mask of the PLMT is significantly better than the results of other semi-supervised methods. Furthermore, the PLMT framework is significantly more precise than other methods in terms of detecting ambiguous boundaries and complex regions.

In a word, based on the results of three datasets, our PLMT framework demonstrates superior performance than the other five semi-supervised methods for medical image segmentation. It should be noted that to purely validate the efficacy of the combination of consistency regularization and self-training pipeline, we do not employ strong data augmentation in the PLMT approach, even though injecting strong data augmentation into the input samples has the potential to improve the performance of semi-supervised segmentation methods, such as those employed in [19, 36].

Ablation studies

The PLMT framework is an approach that integrates the student-teacher structure into the self-training process with two semi-supervised losses and adaptive loss weights. In addition, the temperature factor K is introduced to bridge the magnitude gap between different semi-supervised loss values. Therefore, in the ablation experiments, we focus on verifying the effectiveness of the PLMT structure and the temperature factor K.

Effect of the PLMT structure

Since the PLMT structure is the combination of the Mean Teacher and self-training methods, we have demonstrated that the proposed PLMT has superior performance over the single Mean Teacher or self-training by quantitative results on three datasets in Tables 24 of the comparison experiments. Therefore, in the ablation experiment, we show more visualization examples to illustrate the PLMT has superior segmentation performance.

Figs 57 show the visual samples of PLMT versus MT and self-training on the three datasets, where “Label” refers to the ground truth corresponding to the sample, “SgeMap” and “CAM” refer to the segmentation maps and corresponding gradient localization maps produced by different semi-supervised methods. As can be seen from Fig 4 the segmentation challenge in the ACDC dataset is primarily in the right ventricle denoted in red portion. Thus we only focus on the right ventricle part for the comparison in Fig 7. From the above figures, we can see that the gradient localization maps resulting from PLMT are more accurate and the segmentation maps are better matched to the labels compared to the single Mean Teacher or self-training methods. It demonstrates that the PLMT framework which integrates two semi-supervised methods is both more accurate and generalizable than a single semi-supervised method.

thumbnail
Fig 5. Visual comparison examples on the LA dataset in ablation study.

https://doi.org/10.1371/journal.pone.0300039.g005

thumbnail
Fig 6. Visual comparison examples on the Spleen dataset in ablation study.

https://doi.org/10.1371/journal.pone.0300039.g006

thumbnail
Fig 7. Visual comparison examples on the ACDC dataset in ablation study.

https://doi.org/10.1371/journal.pone.0300039.g007

Effect of the temperature factor K

The ablation study about the temperature factor K is performed on the LA dataset by using 10% labeled samples, to primarily demonstrate the effectiveness of the value of K and the weights of different unsupervised losses(see Eq 4). Table 5 shows the quantitative results of the ablation study, in which α refers to the weight of the consistency loss, β refers to the weight of the pseudo loss, the bolded parts indicate the best results and the underlined parts indicate the suboptimal results.

thumbnail
Table 5. Quantitative results of the temperature factor K and the adaptive weights α and β of different unsupervised losses on the LA dataset.

Best results are in bold and suboptimal results are in underlined.

https://doi.org/10.1371/journal.pone.0300039.t005

As we can observe from Table 5, there is only pseudo-label loss in the PLMT framework when K = 0, which has the same pipeline as the self-training, but since the PLMT remains with smaller weights for the pseudo-labeling loss, it is unable to take full advantage of the pseudo-labeling loss. Therefore, the performance of the PLMT is inferior compared to the f the corresponding self-training method in Table 2 (Dice score: 87.61% → 86.97%). When K = 1, due to the relatively small value of the consistency loss, the PLMT framework blindly increases the weight of the consistency loss to minimize the overall loss, resulting in the overfitting problem. When K is approximately 1000, the framework effectively bridges the magnitude gap between pseudo-label loss and consistency loss. In this setting, PLMT efficiently leverages the strengths of both semi-supervised losses, resulting in superior segmentation performance. The ablation study demonstrates that fine-tuning the value of the temperature factor K in the PLMT framework further improves the model performance in medical image segmentation tasks.

Discussion

In the medical image analysis domain, it is expensive and time-consuming to obtain a lot of precisely labeled images. Semi-supervised learning methods can decrease the reliance on labeled data and reduce the cost and time of data preparation. Furthermore, for some rare diseases where it is difficult to obtain enough labeled data, semi-supervised methods could better utilize the small amount of labeled data for more effective research. However, traditional semi-supervised learning methods usually focus only on certain perspectives, such as consistency regularisation or pseudo-labeling. To design a more accurate and robust semi-supervised method, we propose the PLMT framework. Unlike other semi-supervised methods, the PLMT framework integrates the student-teacher structure into the self-training pipeline and combines pseudo-labeling with the consistency regularization method to achieve much more precise segmentation performance. In particular, in the PLMT framework, we utilize the teacher-student structure to obtain more accurate pseudo-labels in Stage A. At Stage C, we establish the teacher-student structure with consistency loss and pseudo-label loss. To better trade off the contribution of two semi-supervised losses for different segmentation tasks, we used adaptive loss weights for different semi-supervised losses, and PLMT could adaptively adjust the weights of different semi-supervised losses, which could achieve more accurate segmentation performance with limited labels. In addition, we introduce the temperature factor K to eliminate the gap between the values of different semi-supervised losses to avoid the risk of overfitting in the PLMT framework.

To validate the performance of the PLMT framework, we evaluate it on three different medical image segmentation tasks to demonstrate its effectiveness and robustness. The comparison results in Tables 24 show that the PLMT achieves the best results compared to the other five semi-supervised segmentation methods. In addition, the visual examples in Figs 57 also show that the PLMT can achieve more accurate segmentation of lesions or regions of interest with limited labels. From the results in Table 5, it can be observed that the temperature factor K could effectively avoid the overfitting risk arising from the adaptive semi-supervised loss weights in the PLMT framework.

Overall, PLMT is a framework for medical image segmentation that incorporates two semi-supervised methods, which achieves a significant improvement in segmentation performance over single semi-supervised methods such as consistency regularisation or pseudo-label. The PLMT framework demonstrates that incorporating multiple semi-supervised methods from different perspectives can improve the performance of the segmentation backbone from different perspectives. In other words, the PLMT framework illustrates that combining multiple semi-supervised methods can improve the accuracy and robustness of the segmentation model more than a single semi-supervised method. It should be noted that when combining multiple semi-supervised losses, the different semi-supervised loss values must be adjusted to the same magnitude by the temperature factor K to avoid overfitting the framework. In future work, we aim to investigate the framework that can integrate further semi-supervised methods to improve the accuracy and generalization of the medical image segmentation model and to reduce the dependence of segmentation models on labeled data.

Traditional semi-supervised methods due to the training of the segmentation model in both Stage A and Stage C. In future work, the use of more efficient methods to generate more accurate pseudo-labels can further improve the performance of the PLMT framework. In future work, adopting more efficient methods to generate more accurate pseudo-labels can further improve the performance of the PLMT framework.

Conclusion

In this study, we introduce a novel and efficacious semi-supervised learning framework named PLMT, for medical image segmentation. By synergistically integrating self-training with the Mean Teacher structure, our method outperforms these two standalone semi-supervised learning approaches. Additionally, our method allows for the adaptive adjustment of the loss weights between the consistency and pseudo-label to further optimizer segmentation performance, especially under constraints of limited labeled samples. Extension experiments demonstrate our framework has achieved superior performance compared with the other two methods on three medical datasets. While this research represents an initial exploration into the confluence of self-training and consistency regularization, future work will incorporate diverse strategies to enhance the efficacy of semi-supervised methods in medical image segmentation.

Supporting information

S1 Fig. The setup of different semi-supervised methods.

(a) refers to the setting of the mean teacher, (b) refers to the setting of the self-training, and (c) refers to the setting of the proposed PLMT. Since the PLMT framework is the combination of Mean Teacher and self-training methods, the settings in the PLMT framework are the same as in the mean teacher and self-training methods.

https://doi.org/10.1371/journal.pone.0300039.s001

(TIF)

Acknowledgments

All datasets used in this paper are publicly available. The codes and results included in this study are available at https://github.com/LBL0704/PLMT. The authors have no competing relevant interest to declare about the content of this article.

References

  1. 1. Masood S., Sharif M., Masood A., Yasmin M., Raza M., A survey on medical image segmentation, Current Medical Imaging 11 (1) (2015) 3–14.
  2. 2. O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation, in: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, Springer, 2015, pp. 234–241.
  3. 3. F. Milletari, N. Navab, S.-A. Ahmadi, V-net: Fully convolutional neural networks for volumetric medical image segmentation, in: 2016 fourth international conference on 3D vision (3DV), Ieee, 2016, pp. 565–571.
  4. 4. J. Chen, Y. Lu, Q. Yu, X. Luo, E. Adeli, Y. Wang, et al. Transunet: Transformers make strong encoders for medical image segmentation, arXiv preprint arXiv:2102.04306 (2021).
  5. 5. Hesamian M. H., Jia W., He X., Kennedy P., Deep learning techniques for medical image segmentation: achievements and challenges, Journal of digital imaging 32 (2019) 582–596. pmid:31144149
  6. 6. Chapelle O., Scholkopf B., Zien A., Semi-supervised learning (chapelle, o. et al., eds.; 2006)[book reviews], IEEE Transactions on Neural Networks 20 (3) (2009) 542–542.
  7. 7. Sohn K., Berthelot D., Carlini N., Zhang Z., Zhang H., Raffel C. A., et al., Fixmatch: Simplifying semi-supervised learning with consistency and confidence, Advances in neural information processing systems 33 (2020) 596–608.
  8. 8. A. Blum, T. Mitchell, Combining labeled and unlabeled data with co-training, in: Proceedings of the eleventh annual conference on Computational learning theory, 1998, pp. 92–100.
  9. 9. A. Tarvainen, H. Valpola, Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results, Advances in neural information processing systems 30 (2017).
  10. 10. D.-H. Lee, et al., Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks, in: Workshop on challenges in representation learning, ICML, Vol. 3, Atlanta, 2013, p. 896.
  11. 11. Lu L., Yin M., Fu L., Yang F., Uncertainty-aware pseudo-label and consistency for semi-supervised medical image segmentation, Biomedical Signal Processing and Control 79 (2023) 104203.
  12. 12. S. Laine, T. Aila, Temporal ensembling for semi-supervised learning, arXiv preprint arXiv:1610.02242 (2016).
  13. 13. Miyato T., Maeda S.-i., Koyama M., Ishii S., Virtual adversarial training: a regularization method for supervised and semi-supervised learning, IEEE transactions on pattern analysis and machine intelligence 41 (8) (2018) 1979–1993. pmid:30040630
  14. 14. Z. Ke, D. Wang, Q. Yan, J. Ren, R. W. Lau, Dual student: Breaking the limits of the teacher in semi-supervised learning, in: Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 6728–6736.
  15. 15. Y. Ouali, C. Hudelot, M. Tami, Semi-supervised semantic segmentation with cross-consistency training, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 12674–12684.
  16. 16. Y. Li, L. Luo, H. Lin, H. Chen, P.-A. Heng, Dual-consistency semi-supervised learning with uncertainty quantification for covid-19 lesion segmentation from ct images, in: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part II 24, Springer, 2021, pp. 199–209.
  17. 17. X. Chen, Y. Yuan, G. Zeng, J. Wang, Semi-supervised semantic segmentation with cross pseudo supervision, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2613–2622.
  18. 18. W. Bai, O. Oktay, M. Sinclair, H. Suzuki, M. Rajchl, G. Tarroni, et al., Semi-supervised learning for network-based cardiac mr image segmentation, in: Medical Image Computing and Computer-Assisted Intervention- MICCAI 2017: 20th International Conference, Quebec City, QC, Canada, September 11-13, 2017, Proceedings, Part II 20, Springer, 2017, pp. 253–260.
  19. 19. L. Yang, W. Zhuo, L. Qi, Y. Shi, Y. Gao, St++: Make self-training work better for semi-supervised semantic segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4268–4277.
  20. 20. Y. Zou, Z. Zhang, H. Zhang, C.-L. Li, X. Bian, J.-B. Huang, et al., Pseudoseg: Designing pseudo labels for semantic segmentation, arXiv preprint arXiv:2010.09713 (2020).
  21. 21. Yu L, Wang S, Li X, Fu CW, Heng PA. Uncertainty-aware self-ensembling model for semi-supervised 3D left atrium segmentation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part II 22. Springer; 2019. p. 605–613.
  22. 22. Shi Y., Zhang J., Ling T., Lu J., Zheng Y., Yu Q., et al., Inconsistency-aware uncertainty estimation for semi-supervised medical image segmentation, IEEE transactions on medical imaging 41 (3) (2021) 608–620.
  23. 23. X. Luo, J. Chen, T. Song, G. Wang, Semi-supervised medical image segmentation through dual-task consistency, in: Proceedings of the AAAI conference on artificial intelligence, Vol. 35, 2021, pp. 8801–8809.
  24. 24. Wu Y., Ge Z., Zhang D., Xu M., Zhang L., Xia Y., et al., Mutual consistency learning for semi-supervised medical image segmentation, Medical Image Analysis 81 (2022) 102530. pmid:35839737
  25. 25. Luo X., Wang G., Liao W., Chen J., Song T., Chen Y., et al., Semi-supervised medical image segmentation via uncertainty rectified pyramid consistency, Medical Image Analysis 80 (2022) 102517. pmid:35732106
  26. 26. Wang K., Zhan B., Zu C., Wu X., Zhou J., Zhou L., et al., Semi-supervised medical image segmentation via a tripled-uncertainty guided mean teacher model with contrastive learning, Medical Image Analysis 79 (2022) 102447. pmid:35509136
  27. 27. Y. Wu, Z. Wu, Q. Wu, Z. Ge, J. Cai, Exploring smoothness and class-separation for semi-supervised medical image segmentation, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2022, pp. 34–43.
  28. 28. Y. Xie, J. Zhang, Z. Liao, J. Verjans, C. Shen, Y. Xia, Pairwise relation learning for semi-supervised gland segmentation, in: Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part V 23, Springer, 2020, pp. 417–427.
  29. 29. You C., Zhou Y., Zhao R., Staib L., Duncan J. S., Simcvd: Simple contrastive voxel-wise representation distillation for semi-supervised medical image segmentation, IEEE Transactions on Medical Imaging 41 (9) (2022) 2228–2237. pmid:35320095
  30. 30. Bernard O., Lalande A., Zotti C., Cervenansky F., Yang X., Heng P.-A., et al., Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: is the problem solved?, IEEE transactions on medical imaging 37 (11) (2018) 2514–2525. pmid:29994302
  31. 31. Xiong Z., Xia Q., Hu Z., Huang N., Bian C., Zheng Y., et al., A global benchmark of algorithms for segmenting the left atrium from late gadolinium-enhanced cardiac magnetic resonance imaging, Medical image analysis 67 (2021) 101832. pmid:33166776
  32. 32. A. L. Simpson, M. Antonelli, S. Bakas, M. Bilello, K. Farahani, B. Van Ginneken, et al., A large annotated medical image dataset for the development and evaluation of segmentation algorithms, arXiv preprint arXiv:1902.09063 (2019).
  33. 33. A. Hatamizadeh, Y. Tang, V. Nath, D. Yang, A. Myronenko, B. Landman, et al., Unetr: Transformers for 3d medical image segmentation, in: Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2022, pp. 574–584.
  34. 34. Vu TH, Jain H, Bucher M, Cord M, Pérez P. Advent: Adversarial entropy minimization for domain adaptation in semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2019. p. 2517–2526.
  35. 35. Peng J, Estrada G, Pedersoli M, Desrosiers C. Deep co-training for semi-supervised image segmentation. Pattern Recognition. 2020;107:107269.
  36. 36. J. Yuan, Y. Liu, C. Shen, Z. Wang, H. Li, A simple baseline for semi-supervised semantic segmentation with strong data augmentation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 8229–8238.