Figures
Abstract
Cardiac segmentation plays a crucial role in the diagnosis of cardiovascular diseases. However, the manual annotation of cardiac structures is a labor-intensive and time-consuming task that requires highly trained experts. Moreover, the availability of labeled data for training segmentation models is often limited due to the challenges associated with acquiring accurate annotations. To address this issue, we propose a novel semi-supervised cardiac segmentation framework that only needs a small set of labeled data with a larger pool of unlabeled data. We propose three strategies: dynamic pseudo-label threshold map, robust entropy minimization and contrastive consistency from the perspective of pseudo-labeling, entropy minimization and consistency regularization. Specifically, we propose a pixel-wise, class-wise and adaptive map to generate threshold maps and use the map for robust entropy minimization to reduce the noise from low-confidence samples. Besides, to utilize the unlabeled data sufficiently, we add contrastive consistency loss to implement regularization. Extensive experiments on the ACDC and MMWHS datasets demonstrate that our method achieves competitive performance compared to state-of-the-art approaches across various labeled data ratios. Ablation studies further validate the effectiveness and robustness of each component. Our framework shows strong potential for accurate diagnosis with limited annotations, and our code will be made publicly available.
Citation: Mi Y, Zhang J, Jin H, Yin J, He Y, Xie G, et al. (2026) Dynamic thresholding and robust contrastive techniques for enhanced semi-supervised cardiac segmentation. PLoS One 21(4): e0342567. https://doi.org/10.1371/journal.pone.0342567
Editor: Francesco Bardozzo, University of Salerno: Universita degli Studi di Salerno, ITALY
Received: June 21, 2025; Accepted: January 26, 2026; Published: April 6, 2026
Copyright: © 2026 Mi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data that support the findings of this study are openly available in ACDC at the following URL: https://www.creatis.insa-lyon.fr/Challenge/acdc/databases.html and MMWHS at the following URL: https://github.com/zxhzxhgithub/mmwhs.
Funding: The author(s) received no specific funding for this work.
Competing interests: No authors have competing interests.
1 Introduction
Cardiac segmentation plays a crucial role in medical imaging analysis, specifically in the field of cardiology. It involves the accurate delineation of the heart’s anatomical structures from medical images, such as magnetic resonance imaging (MRI) or computed tomography (CT) scans. Accurate segmentation enables the planning of surgical interventions, such as valve replacements or cardiac resynchronization therapy.
In recent years, deep learning techniques have shown great potential in automating cardiac segmentation tasks [1–5]. However, the success of these deep learning methods is heavily predicated on the availability of large volumes of meticulously annotated data. In the context of medical imaging, acquiring such extensive labeled datasets presents significant challenges due to the high cost, the requirement for specialized domain-specific expertise from cardiologists and radiologists, and the protracted, laborious process involved in generating accurate, pixel-level annotations [6]. This scarcity of labeled data severely limits the applicability and generalization capabilities of fully supervised deep learning models in real-world clinical settings, where diverse patient populations, varying anatomical structures, and differing imaging protocols are common.
To circumvent this critical data bottleneck, many researchers have explored annotation-efficient methods for medical image segmentation, including data augmentation [7,8], utilizing external related datasets [9], and particularly, leveraging unlabeled data through semi-supervised learning [10–15]. Among these, semi-supervised segmentation has emerged as a highly practical and impactful approach [16], promoting the synergistic utilization of readily obtainable unlabeled data in combination with a limited quantity of labeled data for training robust segmentation models. This technique holds significant promise for advancing clinical applications by enabling model deployment with less reliance on costly manual annotations.
However, despite its promise, existing semi-supervised segmentation methods face substantial hurdles, especially when applied to complex medical images such as cardiac scans [17]. Medical images often exhibit intricate anatomical structures, subtle pathological features critical for diagnosis, and considerable variations stemming from different imaging modalities, patient physiologies, and acquisition settings. These complexities make it exceptionally difficult for current semi-supervised methods to accurately estimate the uncertainty and confidence of generated pseudo-labels [12,16]. For instance, the often-fuzzy boundaries between cardiac chambers or the presence of small, irregular lesions can lead to ambiguous predictions, resulting in noisy or incorrect pseudo-labels that can mislead model training.
Furthermore, effectively and efficiently leveraging the vast amount of unlabeled data remains a challenge. Many existing methods [11,18,19] typically rely on calculating a fixed confidence threshold to select “high-quality" pseudo-labels for supervision. This static thresholding approach, however, is inherently problematic. It fails to adapt to the dynamic evolution of pseudo-label quality throughout the training process; what might be considered high-confidence early in training could still be quite noisy, while later, slightly lower-confidence predictions could be highly accurate. Consequently, a static threshold risks either discarding valuable information by being too conservative or introducing significant noise by being too permissive, thereby hindering optimal model performance. Beyond pseudo-label selection, many current methods primarily focus on pixel-level consistency [11], potentially underutilizing the rich contextual and structural information present in unlabeled medical images that could be exploited for more robust learning.
Based on the above critical observations and the identified limitations in current semi-supervised cardiac segmentation, we propose a novel semi-supervised segmentation framework designed to robustly leverage a small amount of annotated data alongside a large pool of unlabeled data to achieve superior outcomes. Specifically, to overcome the limitations of static thresholding and accurately estimate pseudo-label confidence, we introduce a dynamic pseudo-label threshold map strategy. This strategy generates a pixel-wise, class-wise, and adaptive map, enabling a more nuanced and accurate selection of high-confidence pseudo-labels throughout the dynamic training process. Furthermore, to mitigate the detrimental impact of low-confidence pseudo-labels and amplify the beneficial effect of high-confidence ones, we propose a robust entropy minimization loss function that explicitly leverages the insights from our dynamic threshold map. Lastly, to more effectively and efficiently utilize the rich information within unlabeled data beyond just pseudo-labeling, we introduce a novel contrastive consistency strategy that optimizes the model in an unsupervised and contrastive manner, capturing broader structural relationships. Extensive experiments are conducted on two cardiac segmentation datasets ACDC [20] and MMWHS [21]. Experimental results on two cardiac segmentation benchmarks show that our method delivers strong, competitive performance compared to current state-of-the-art approaches under various percentages of labeled data during training. We also conduct extensive experiments and detailed analysis to thoroughly evaluate the effectiveness and robustness of our method under various settings, including challenging low-data regimes.
Our main contributions can be summarized as follows:
- We propose a novel semi-supervised segmentation framework for cardiac segmentation, which only needs a small set of labeled data and can achieve competitive results as a fully supervised model.
- We propose dynamic pseudo-label threshold map, robust entropy minimization and contrastive consistency strategies. They can not only generate pixel-wise, class-wise and adaptive threshold maps but also utilize the unlabeled data efficiently and effectively.
- Our method demonstrates state-of-the-art competitive performance on two cardiac segmentation datasets, as evidenced by comprehensive benchmarking.
2 Related work
2.1 Cardiac image segmentation
Cardiac segmentation has garnered significant attention in the field of medical imaging analysis. In recent years, numerous studies have focused on developing robust and efficient techniques for cardiac segmentation. Early works [1,22–24] utilized deep fully convolutional neural network (FCN) [25] architectures to address cardiac MRI segmentation. For example, [22] proposed to tackle the problem of automated left and right ventricle segmentation through the application of a deep fully convolutional neural network architecture. [1] used an ensemble of UNet [26] inspired architectures for segmentation of cardiac structures on each time instance of the cardiac cycle. With the success of Transformer [27] in various fields, Transformer-based segmentation methods [2–5] have surpassed FCN-based methods. This is due to the intrinsic locality of convolution operations. Besides, U-Net generally demonstrates limitations in explicitly modeling long-range dependency. Therefore, [2] proposed TransUNet, which merits both Transformers and U-Net. TransUNet utilized Transformers as strong encoders to enhance finer details by recovering localized spatial information. Later, [4] proposed the Fully Convolutional Transformer (FCT) to learn fine-grained image representations and effectively capture long-term dependencies.
However, these methods all need large volumes of data to obtain desirable performance. It is difficult to acquire a large amount of annotated data, particularly for medical imaging where experts are needed to provide reliable and accurate annotations [28]. In contrast, our paper concentrates on semi-supervised segmentation where only a small amount of labeled data is needed and can still achieve desirable results.
2.2 Semi-supervised learning
Semi-supervised learning can be divided into five main groups [29]: generative methods, consistency regularization methods, graph-based methods, pseudo-labeling methods and hybrid methods. Generative methods [30–32] can the implicit features of data to better model data distributions. Consistency regularization methods [33,34] impose a regularization term to the loss function to specify the prior constraints. Graph-based methods [35–37] assume that a graph can be derived from the raw dataset, with each node representing a training sample and each edge indicating a measure of similarity between pairs of nodes. Pseudo-labeling [38,39] methods propose to select high-confident pseudo-labels and use these pseudo-labels for supervised training. Hybrid methods [40–42] combine the aforementioned methods to bring comprehensive performance improvement. For example, MixMatch [40] combines consistency regularization and entropy minimization into a unified loss function. FixMatch [42] combines consistency regularization and pseudo-labeling.
2.3 Semi-supervised medical image segmentation
Semi-supervised medical image segmentation can be divided into three main groups: pseudo-labeling methods [11,18], consistency regularization methods [13,43] and knowledge prior methods [44,45]. [11] propose a contrastive-radical network based on uncertainty estimation and a separate self-training strategy. [13] design a self-paced learning strategy for co-training that lets jointly-trained neural networks focus on easier-to-segment regions first, and then gradually consider harder ones. [44] propose a supervised local contrastive loss that leverages limited pixel-wise annotation to force pixels with the same label to gather around in the embedding space. [6] introduce a robust class-wise sampling method and dynamic stabilization for a better training strategy. [46] represent a specific type of data-adaptive regularization paradigm that aids in minimizing the overfitting of labeled data under high confidence values.
3 Proposed method
3.1 Problem formulation
In segmentation tasks, given an image , the goal is to predict the label map
where C is the class number. The value of
is between 0 and 1 representing the probability of the corresponding class. In semi-supervised segmentation, the datasets are divided into the labeled image set
and unlabeled image set
where Nl and Nu represent the number of images of the two image sets and
. The goal of semi-supervised segmentation is how to fully leverage the two image sets to get accurate predictions.
3.2 Overall architecture
Similar to other methods [6,47] in semi-supervised segmentation, we adopt the student and teacher setup [48] as the foundational structure for semi-supervised segmentation. Specifically, as shown in Fig 1, the teacher model with parameters is updated by the exponential moving average of the parameters of the student model
as follows:
where β is the moving average coefficient [48] and t is the training step index. For the teacher model, we input the x from to obtain the pseudo-label
(see Sect 3.3 for details). For the student model, we input the x from
and
to get the prediction maps
. For the labeled data
, we use the common supervised loss function:
where B is the batch size, is the pixel-wise cross-entropy loss function. For the unlabeled data
, we use robust pixel-wise cross-entropy loss function
(see Sect 3.4 for details):
where is the pseudo-label for the i-th unlabeled sample generated by the teacher model.
Furthermore, to impose consistency regularization, we add a contrastive loss term (see Sect 3.5 for details). Therefore, our final loss function can be formulated as:
where and
are the trade-offs between the losses. Then, we can use the loss function
to optimize the student model
.
3.3 Dynamic pseudo-label threshold map
Generating pseudo-labels for self-training and entropy minimization is a common strategy in semi-supervised learning [11,18,19,42]. Empirically, not all samples are good enough to serve as pseudo-labels for self-training due to the noise in samples. When the low-quality pseudo-labels are used for self-training, the performance of the model will degrade. Existing semi-supervised segmentation methods often generate high-quality pseudo-labels by measuring the confidence of the predicted probabilistic distribution, which can be considered as a static threshold. However, this static threshold cannot accurately capture the evolving quality of labels during the dynamic training process.
To address the limitations of static thresholding, we propose a novel dynamic pseudo-label threshold map strategy. This strategy aims to generate a pixel-wise and adaptive confidence threshold map that evolves within each training epoch, updating batch-by-batch. This enables a more accurate and responsive selection of high-quality pseudo-labels for supervision.
Concretely, let t denote the current epoch and k denote the current batch index, where and K is the total number of batches per epoch. Let
be the set of unlabeled images in the k-th batch of epoch t, with batch size B. For each image
, the teacher model generates a pseudo-label probability map
, where H, W, and C denote the height, width, and number of classes, respectively. We first compute the batch-averaged maximum confidence map
as:
where (h,w) represents the spatial coordinates. The dynamic threshold map Mt,k is then updated using an Exponential Moving Average (EMA) to ensure temporal smoothness and stability across batches:
where is the momentum coefficient. For the initial state M0,0, we initialize all pixel values to 1/C, representing uniform uncertainty.
Furthermore, cardiac structures often exhibit significant class imbalance (e.g., ventricles are much larger than atria). To prevent smaller or lower-confidence classes from being ignored, we introduce a class-wise adaptation. We compute the class-specific confidence score for each class c within the current batch:
where is the indicator function which equals 1 if the predicted class for pixel (h, w) in image xi is c, and 0 otherwise.
is a small constant for numerical stability. We then normalize these scores to obtain the class-wise scaling factor
:
Finally, the class-wise dynamic pseudo-label threshold Mt,k,c(h,w) is obtained by modulating the spatial threshold map:
This mechanism ensures that classes with inherently lower prediction confidence (e.g., harder-to-segment small structures) are assigned proportionally lower thresholds, facilitating the recruitment of valid pseudo-labels for these classes.
For a given unlabeled sample, we generate the final confidence weighting map γ to select high-quality pixels for training. Let denote the predicted class at location (h, w). A pixel (h, w) is considered reliable for its predicted class c* if its predicted probability exceeds the corresponding dynamic threshold:
where is the dynamic pixel-wise and class-wise confidence map (Table 1).
3.4 Robust entropy minimization
In Sect 3.3, we obtain a map γ that can dynamically measure the pixel-wise and class-wise confidence. For high-quality pixels, we wish they would contribute more to optimization while the influence of those low-quality pixels will be eliminated. Therefore, we propose a robust entropy minimization strategy by adding the confidence value γ to the cross-entropy loss function:
where i indexes all pixel locations (h, w) in the image (flattened into a single dimension), is the corresponding confidence weight, and pi is the confidence of the prediction at that location, which can be calculated as:
where τ is a temperature hyperparameter. From Eq. 11, we can observe that the high-quality predictions (i.e. is high) will contribute more to the loss function which in turn help the optimization process and the influence of low-quality predictions (i.e.
is small) will be alleviated.
3.5 Contrastive consistency
In Sects 3.3 and 3.4, we introduce two strategies for semi-supervised segmentation from the perspective of pseudo-labeling and entropy minimization. In this section, we will introduce a contrastive consistency regularization strategy.
A segmentation network usually consists of an encoder
to project the image into a latent representation and a decoder
to reconstruct the representation into a segmentation map. Specifically, we input the images into the encoder to get the latent feature representation
. For a batch of images, we implement a series of image augmentation strategies, such as color jitter, random cropping, noise injection and cutout. Then, we will have two groups of representations
and
for original images and augmented images. Motivated by contrastive learning [49,50], we propose to make the representation of one image and its augmented image representation closer and push it far away from the representations of other images in the batch. We consider a modified NT-Xent loss [50] as our loss function. Let
denote the dot product between
normalized u and v (i.e. cosine similarity). The loss function for a positive example
is as follows:
where τ is a temperature parameter, B is the batch size and is a sign function evaluating to 1 if
. Then the contrastive loss function can be formulated as:
where B is the batch size. Through contrastive consistency, the model can differentiate different samples and make consistent predictions for the same sample, which will improve the generalization ability of the model.
3.6 Synergistic interaction of components
The three core components of our framework work synergistically to enhance semi-supervised learning by addressing complementary aspects of the training process. The Dynamic Pseudo-label Threshold Map (DPTM) and Robust Entropy Minimization (REM) operate in the prediction space to ensure the reliability of pseudo-labels: DPTM acts as an adaptive filter to select high-confidence pixels, while REM further mitigates the impact of noise by weighting the optimization based on pixel-wise confidence. Complementing this, the Contrastive Consistency (CC) module operates in the feature space, enhancing the model’s discriminative power and global structural consistency. Crucially, these modules form a mutually reinforcing feedback loop: the improved feature representations learned via CC lead to more accurate predictions, which in turn enable DPTM to generate more precise thresholds and high-quality pseudo-labels for subsequent training iterations. This virtuous cycle ensures that the model progressively refines itself, effectively leveraging both labeled and unlabeled data.
4 Results
4.1 Datasets
4.1.1 ACDC dataset.
ACDC [20] contains data from 150 multi-equipments cine-MRI recordings with reference measurements and classification from two medical experts. For every patient, it has around 15 volumes covering the entire cardiac cycle and expert annotations for left and right ventricles and myocardium.
4.1.2 MMWHS dataset.
MMWHS [21] consists of 20 cardiac MRI samples with expert annotations for seven structures: left and right ventricles, left and right atrium, pulmonary artery, myocardium, and aorta.
4.2 Evaluation metrics
Following previous works [2,5,51], we use Dice Similarity Score (DSC), Jaccard Score (Jaccard), 95% Hausdorff Distance (95HD) and Average Surface Distance (ASD) for evaluation. DSC is defined as the ratio of the overlap between the predicted contour A and the real contour B. HD is defined by the maximum between the directed average Hausdorff distance h(A, B) and its reverse direction h(B, A) where A and B represent the ground truth and predicted segmentation, respectively. From the definition, we know that a higher DSC and lower HD mean better performance.
4.3 Implementation details
Following previous works [6,52,53], we use 1.25%, 2.5%, and 10% labeled data from ACDC and 10%, 20%, and 40% labeled data from MMWHS. We use UNet [26] as our backbone. We implement data augmentations to enlarge the datasets and avoid overfitting. All the experiments are conducted on a Ubuntu desktop with RTX3090 GPUs. We use the SGD optimizer with an initial learning rate 1e–2, momentum 0.9 and weight decay 5e–4. A cosine learning rate scheduler is implemented to the optimizer with and
for initial and final learning rates. The batch size is set to 16 by default.
For experimental setup, all volumetric data underwent consistent preprocessing, including per-volume intensity normalization and center-cropping for background removal. During training, we applied an extensive suite of data augmentations to both labeled and unlabeled data with specified ranges: random rotation (), scaling (0.7–1.3), elastic deformations (
), random flipping (prob=0.5), and brightness adjustments (0.7–1.3). For both ACDC and MMWHS, we follow the official training/validation/test splits provided by the challenge organizers. We do not use early stopping, and the final checkpoint at the last training epoch is used for evaluation. For our method, all experiments (including main comparisons and ablation studies) are repeated with three fixed random seeds, and we report the mean ± standard deviation; baseline results follow either the numbers reported in their original papers or our single-run re-implementations under the same experimental setting.
We compare our method with several baselines and recent state-of-the-art semi-supervised cardiac segmentation methods: Self train [54], Data aug [55], Context Restoration [56], Mixmatch [40], Global [50], Global+Local [52], SemiContrast [44], PCL [57], ACINet [6], SSCI [53], PatchCL [19], UA-MT [58], SS-Net [59], DC-Net [60], CauSSL [61], BCP [47], SDCL [17], Strong teacher [62], and Improved-UniMatch [63].
To ensure fair and reproducible comparisons, state-of-the-art baseline methods are either taken from the results reported in their original papers or re-implemented under identical experimental conditions with consistent training budgets and data splits as our method. All methods share the same U-Net backbone, SGD optimizer with identical learning rate schedules, and are evaluated using the same hardware and evaluation metrics to ensure apples-to-apples comparisons.
4.4 Quantitative results
In Fig 2, we evaluate our proposed method on ACDC and MMWHS datasets under different percentages of labeled data during training. For comparison, we add a fully supervised training baseline which has 100% of labeled data. The results on the ACDC dataset are shown in Fig 2A. The DSC and HD of the model only using 10% data are 0.906 and 1.33, respectively while the DSC and HD of the fully supervised model are 0.915 and 1.645, respectively. We can observe that the performance of using only 10% of data is comparable to the fully supervised performance, which indicates the superiority of our methods. In Fig 2B, the DSC and HD of the model only using 40% data are 0.855 and 2.02, respectively while the DSC and HD of the fully supervised model are 0.863 and 1.986, respectively, where we can find similar conclusions. With a small portion of labeled data, our method can achieve competitive results as fully supervised training.
4.5 Comparison with state-of-the-arts
Table 2 presents the average DSC of different methods on the ACDC and MMWHS datasets under different percentages of labeled data during training. Overall, our proposed method achieves competitive or superior performance compared to other state-of-the-art methods on both datasets, with only a marginally lower DSC (–0.001) than SSCI on ACDC with a 10% of labeled data for training. Besides, we present a detailed comparison across different metrics on ACDC in Table 3. These results indicate that our method is highly competitive and often achieves the best performance across different settings.
Bold: best result. Underline: second best result.
In Fig 3, we select three baselines with good performances (ACINet, PatchCL and SSCI) from Table 2 to visually compare and assess their performance. Observing the six instances presented in the figure, it is apparent that our method exhibits a significant advantage over other approaches, particularly in scenarios where the boundary textures are unclear and the segmented regions are small. For example, in the third row, our method can accurately segment the red region, whereas the other baseline methods fail to do so. This superiority is further demonstrated in the sixth row, where our method successfully segments the small red region which is surrounded by the yellow region, whereas the other three baselines fail to detect it. Besides, in the first and second rows, the compared methods produce thicker segmentations of the green circle region, whereas our method generates results that are more similar to the ground truth. The above observations further validate the effectiveness and superiority of our method.
For ACDC, we use 2.5% labeled data. For MMWHS, we use 20% labeled data.
4.6 Dynamic pseudo-label threshold map
Pseudo-label-based methods are very popular in semi-supervised segmentation [11,18,19]. However, different from previous methods, our method proposes a dynamic pseudo-label threshold map which is changing adaptively during the training process. Moreover, our dynamic pseudo-label threshold map can measure pixel-wise and class-wise confidence and can provide more accurate results. In Table 5, we present the comparison of performance between our dynamic pseudo-label threshold map strategy with recent pseudo-label-based methods. From the table, we can observe that our pseudo-label strategy outperforms other recent state-of-the-art pseudo-label strategies. Particularly, we can find that when we use less labeled data, the more the performance of our proposed method is ahead of the other pseudo-label strategies. This demonstrates the superiority of our dynamic pseudo-label strategy which can provide pixel-wise, class-wise and adaptive threshold maps.
Furthermore, we visualize our dynamic pseudo-label threshold map in Fig 4 for a better understanding. We visualize an example in the ACDC dataset. There are four classes representing the background, RV, Myo and LV, respectively. The values in the map represent the threshold of the corresponding class. From the figure, we can observe that our threshold map can measure the confidence of the corresponding class accurately, demonstrating the effectiveness of our dynamic pseudo-label threshold map.
c represents the class.
4.7 Per-structure analysis and model interpretability
To address the need for granular performance evaluation and assess the stability of our method, Table 4 presents the per-structure Dice similarity coefficient (DSC) and 95% Hausdorff Distance (95HD) with standard deviations across three independent runs. The results indicate that our method achieves balanced segmentation performance across the Right Ventricle (RV), Myocardium (Myo), and Left Ventricle (LV). Notably, the low standard deviations demonstrate the robustness of our approach, even with limited labeled data. The improvements are particularly significant for the Myocardium, where boundary definition is often ambiguous and challenging for semi-supervised methods.
Results are reported as mean ± standard deviation over 3 runs. RV: Right Ventricle, Myo: Myocardium, LV: Left Ventricle.
For a fair comparison, we only implement the pseudo-label part in these methods (denoted as †) and then use the pseudo-labels for self-training. We report the average DSC in the table. Bold: best result. Underline: second best result.
Furthermore, to investigate model behavior and analyze failure cases, we utilized Explainable AI (XAI) techniques. Fig 5 displays Grad-CAM visualizations for both failure and successful segmentation scenarios. The attention maps reveal that in successful cases (bottom row), the model focuses intensely on the specific cardiac structures, indicating that our Contrastive Consistency strategy effectively drives the network to learn discriminative features. In the failure case (top row), the attention is more diffuse or focused on incorrect boundary regions, highlighting the difficulty of segmenting low-contrast areas. This interpretability allows us to understand that while the model is generally robust, it can still struggle with extremely ambiguous boundaries.
The top row shows the failure results on the ACDC dataset and the bottom row shows the successful results on the MMWHS dataset.
Regarding predictive confidence and calibration, our Dynamic Pseudo-label Threshold Map (DPTM) implicitly functions as an uncertainty management mechanism. Instead of using a static threshold which may include poorly calibrated predictions, DPTM adapts to the learning state of the model class-wise. This effectively filters out high-uncertainty predictions from the pseudo-labeling process, ensuring that the model is trained primarily on reliable, well-calibrated predictions.
4.8 Ablation study
For a better understanding of our proposed method, we conduct a series of ablation experiments. The experiments can be divided into the effectiveness of the three main components of our method and loss trade-off terms.
4.8.1 Component analysis.
In Table 6, we provide a comprehensive overview of the ablation experiments conducted on the three proposed components. The results clearly demonstrate the positive impact of each component on the model’s performance. Notably, our dynamic pseudo-label threshold map strategy emerges as the most influential factor, significantly enhancing the model’s performance. Additionally, the robust entropy loss contributes as the second most important factor in improving the model’s performance. Besides, the synergy achieved by combining these three components yields consistent and substantial improvements in the model’s performance. These findings confirm the effectiveness and significance of our proposed components in augmenting the model’s capabilities.
For ACDC, we use 10% labeled data. For MMWHS, we use 40% labeled data. † denotes the best performance when attached with only one component and * represents the best results attached with two components.
Furthermore, we visualize one example in the ACDC dataset in Fig 6 to validate the effectiveness of the three components. Without any strategies proposed in the paper, the model can not detect the small red region. When we apply one of the strategies in our paper, the model starts to detect the small red region. This fully validates the effectiveness of the three components proposed in our paper. Besides, compared with ground truth, the method without contrastive consistency can still accurately segment different regions. However, the model without the dynamic pseudo-label threshold map strategy can not segment the regions well. For example, the method without the dynamic pseudo-label threshold map strategy segments a much thicker green region compared with the ground truth and can not segment the red region well. This demonstrates the dynamic pseudo-label threshold map strategy brings more improvements compared with the other two components.
4.8.2 Loss trade-off.
In Sect 3.2, we introduce and
to balance our three losses. To explore the impact of different losses on the performance of the model, we select a series of
and
and present our results in Figs 7 and 8.
Fig 7 presents the impact of on the performance of the model. From the figure, we can observe that on both datasets, when the percentage of labeled data is low, a lower
will bring more improvements than a higher
. Correspondingly, when the percentage of labeled data increases, the
that yields the best result will also increase. This is because when the percentage of labeled data is low, the model can not generate very accurate pseudo-labels, indicating that a high value of
will influence the training of the model. When the percentage of labeled data increases, the quality of pseudo-labels will increase, indicating that a high value
will make the model learn more information from the unlabeled data. Moreover, from the table, the DSC scores remain relatively stable no matter how big the
is. This indicates our dynamic pseudo-label threshold map strategy is robust and thus not sensitive to
.
Fig 8 presents the impact of on the performance of the model. It is evident that when
is around 0.1-0.15, the model has the best performance. With the increase of
, the performance of the model will decrease. This indicates the large
will influence the training of the model. From the experiments, we can observe that 0.1-0.15 for
is a good choice.
4.9 Computational efficiency analysis
As described in Sect 4.2, all experiments are conducted on a Ubuntu desktop equipped with NVIDIA RTX3090 GPUs. As shown in Table 7, the training time is approximately 1.25 seconds per batch, and the inference time is 15.8 ms per image. While our method involves additional computations for dynamic thresholding and contrastive learning, the inference speed remains highly efficient for real-time clinical applications. The memory usage is approximately 20.5 GB per GPU.
The training process incorporates DPTM, robust entropy minimization, and contrastive consistency to learn robust features. While these components introduce additional calculations during the optimization phase, they are essential for extracting high-quality pseudo-labels and enforcing structural consistency from limited data.
Crucially, the proposed modules (DPTM and CC) are only active during the training phase. During the inference phase, the model operates as a standard U-Net without any auxiliary branches or thresholding operations, ensuring a rapid inference speed of 15.8 ms per image. This design makes our framework highly suitable for clinical workflows where real-time segmentation is required, justifying the reasonable computational investment during training for superior segmentation accuracy.
4.10 Domain adaptation analysis
To further evaluate the robustness of our method under domain shift scenarios, we conducted transfer learning experiments following PCL [57]. We pre-trained the model on the CHD dataset [64] and fine-tuned it on the MMWHS dataset with a limited number of labeled patients (M). As shown in Table 8, our method consistently outperforms state-of-the-art self-supervised and semi-supervised methods across different labeled data regimes, demonstrating its strong generalization capability across different scanners and protocols.
All methods are pre-trained on CHD and fine-tuned on MMWHS with varying numbers of labeled patients (M).
5 Limitations and future work
This work presents several limitations that offer directions for future research. First, while our method performs robustly in standard low-label settings (e.g., 10%), performance stability in extremely low-label regimes (e.g., 1–2 scans) remains a challenge, as the initial pseudo-labels may be too noisy to guide the dynamic thresholding effectively. Second, the proposed framework introduces computational overhead from the dynamic threshold mapping and contrastive learning components. Although inference time remains efficient, the training resource requirements are higher than simple baselines. Third, performance degradation under severe domain shift remains an inherent challenge across different imaging centers or modalities, requiring further investigation into cross-modality generalization techniques. Furthermore, validation is currently limited to cardiac structures in MRI and CT, leaving generalization to other organs and modalities for future work. Finally, regarding clinical translation, future work should focus on integrating the model into clinical workflows to evaluate its utility in real-world diagnostic scenarios.
Building on these limitations, promising future directions include integrating transformer architectures to capture long-range dependencies and extending the method to 3D volumetric segmentation.
6 Conclusion
In this paper, we propose three novel strategies to address semi-supervised cardiac segmentation. Specifically, we propose a dynamic pseudo-label threshold map, robust entropy minimization and contrastive consistency from the perspective of pseudo-labeling, entropy minimization and consistency regularization, respectively. Dynamic pseudo-label threshold map strategy can generate pixel-wise, class-wise and adaptive threshold maps which can help obtain high-confident pseudo-labels. Robust entropy minimization can help adaptively optimize the model and contrastive consistency can make the model more robust. Extensive experiments are conducted to demonstrate the effectiveness of our method. Furthermore, ablation experiments are conducted for a better understanding of our method.
References
- 1.
Isensee F, Jaeger PF, Full PM, Wolf I, Engelhardt S, Maier-Hein KH. Automatic cardiac disease assessment on cine-MRI via time-series segmentation, domain specific features. In: Statistical Atlases, Computational Models of the Heart. ACDC, MMWHS Challenges: 8th International Workshop and STACOM 2017, Held in Conjunction with MICCAI 2017, Quebec City, Canada, September 10-14, 2017, Revised Selected Papers 8. Springer; 2018. p. 120–9.
- 2.
Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y. Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint 2021. https://arxiv.org/abs/2102.04306
- 3.
Kato S, Hotta K. Adaptive t-vmf dice loss for multi-class medical image segmentation. arXiv preprint 2022. https://arxiv.org/abs/2207.07842
- 4.
Tragakis A, Kaul C, Murray-Smith R, Husmeier D. The fully convolutional transformer for medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2023. p. 3660–9.
- 5.
Rahman MM, Shokouhmand S, Bhatt S, Faezipour M. MIST: Medical Image Segmentation Transformer with Convolutional Attention Mixing (CAM) Decoder. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2024. p. 404–13.
- 6.
Basak H, Ghosal S, Sarkar R. Addressing class imbalance in semi-supervised image segmentation: a study on cardiac MRI. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. 2022. p. 224–33.
- 7. Chaitanya K, Karani N, Baumgartner CF, Erdil E, Becker A, Donati O, et al. Semi-supervised task-driven data augmentation for medical image segmentation. Med Image Anal. 2021;68:101934. pmid:33385699
- 8. Zhang L, Wang X, Yang D, Sanford T, Harmon S, Turkbey B, et al. Generalizing deep learning for medical image segmentation to unseen domains via deep stacked transformation. IEEE Trans Med Imaging. 2020;39(7):2531–40. pmid:32070947
- 9. Zhang Y, Liao Q, Yuan L, Zhu H, Xing J, Zhang J. Exploiting shared knowledge from non-COVID lesions for annotation-efficient COVID-19 CT lung infection segmentation. IEEE J Biomed Health Inform. 2021;25(11):4152–62. pmid:34415840
- 10. Peng J, Estrada G, Pedersoli M, Desrosiers C. Deep co-training for semi-supervised image segmentation. Pattern Recognition. 2020;107:107269.
- 11. Shi Y, Zhang J, Ling T, Lu J, Zheng Y, Yu Q, et al. Inconsistency-aware uncertainty estimation for semi-supervised medical image segmentation. IEEE Trans Med Imaging. 2022;41(3):608–20. pmid:34606452
- 12. Wang P, Peng J, Pedersoli M, Zhou Y, Zhang C, Desrosiers C. Self-paced and self-consistent co-training for semi-supervised image segmentation. Med Image Anal. 2021;73:102146. pmid:34274692
- 13. Wang K, Zhan B, Zu C, Wu X, Zhou J, Zhou L, et al. Semi-supervised medical image segmentation via a tripled-uncertainty guided mean teacher model with contrastive learning. Med Image Anal. 2022;79:102447. pmid:35509136
- 14.
Luo X, Hu M, Song T, Wang G, Zhang S. Semi-supervised medical image segmentation via cross teaching between cnn and transformer. In: International conference on medical imaging with deep learning. PMLR; 2022. p. 820–33.
- 15.
Zhao Z, Hu J, Zeng Z, Yang X, Qian P, Veeravalli B, et al. MMGL: multi-scale multi-view global-local contrastive learning for semi-supervised cardiac image segmentation. In: 2022 IEEE International Conference on Image Processing (ICIP). 2022. https://doi.org/10.1109/icip46576.2022.9897591
- 16. Jiao R, Zhang Y, Ding L, Xue B, Zhang J, Cai R, et al. Learning with limited annotations: a survey on deep semi-supervised learning for medical image segmentation. Comput Biol Med. 2024;169:107840. pmid:38157773
- 17.
Song B, Wang Q. SDCL: students discrepancy-informed correction learning for semi-supervised medical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. 2024. p. 567–77.
- 18.
Thompson BH, Di Caterina G, Voisey JP. Pseudo-label refinement using superpixels for semi-supervised brain tumour segmentation. In: 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI). 2022. p. 1–5. https://doi.org/10.1109/isbi52829.2022.9761681
- 19.
Basak H, Yin Z. Pseudo-label guided contrastive learning for semi-supervised medical image segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023. p. 19786–97.
- 20. Bernard O, Lalande A, Zotti C, Cervenansky F, Yang X, Heng P-A, et al. Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: is the problem solved?. IEEE Trans Med Imaging. 2018;37(11):2514–25. pmid:29994302
- 21. Zhuang X, Shen J. Multi-scale patch and multi-modality atlases for whole heart segmentation of MRI. Med Image Anal. 2016;31:77–87. pmid:26999615
- 22.
Tran PV. A fully convolutional neural network for cardiac segmentation in short-axis MRI. arXiv preprint 2016. https://arxiv.org/abs/1604.00494
- 23.
Lieman-Sifry J, Le M, Lau F, Sall S, Golden D. FastVentricle: cardiac segmentation with ENet. In: International Conference on Functional Imaging and Modeling of the Heart. Springer; 2017. p. 127–38.
- 24. Bai W, Sinclair M, Tarroni G, Oktay O, Rajchl M, Vaillant G, et al. Automated cardiovascular magnetic resonance image analysis with fully convolutional networks. J Cardiovasc Magn Reson. 2018;20(1):65. pmid:30217194
- 25.
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2015. p. 3431–40.
- 26.
Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation. Medical image computing and computer-assisted intervention–MICCAI 2015 : 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. Springer; 2015. p. 234–41.
- 27. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Advances in neural information processing systems. 2017;30.
- 28. Tajbakhsh N, Jeyaseelan L, Li Q, Chiang JN, Wu Z, Ding X. Embracing imperfect datasets: a review of deep learning solutions for medical image segmentation. Med Image Anal. 2020;63:101693. pmid:32289663
- 29. Yang X, Song Z, King I, Xu Z. A survey on deep semi-supervised learning. IEEE Trans Knowl Data Eng. 2023;35(9):8934–54.
- 30.
Springenberg JT. Unsupervised and semi-supervised learning with categorical generative adversarial networks. arXiv preprint 2015. https://arxiv.org/abs/1511.06390
- 31.
Dong J, Lin T. MarginGAN: adversarial training in semi-supervised learning. In: Wallach H, Larochelle H, Beygelzimer A, dAlché-Buc F, Fox E, Garnett R, editors. Advances in Neural Information Processing Systems. vol. 32. Curran Associates, Inc.; 2019.
- 32.
Liu Y, Deng G, Zeng X, Wu S, Yu Z, Wong HS. Regularizing discriminative capability of CGANs for semi-supervised generative learning. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2020. p. 5719–28.
- 33.
Cubuk ED, Zoph B, Mane D, Vasudevan V, Le QV. Autoaugment: learning augmentation policies from data. arXiv preprint 2018. arXiv:180509501
- 34. Xie Q, Dai Z, Hovy E, Luong T, Le Q. Unsupervised data augmentation for consistency training. Advances in Neural Information Processing Systems. 2020;33:6256–68.
- 35.
Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. arXiv preprint 2016. arXiv:160902907
- 36.
Zhuang C, Ma Q. Dual graph convolutional networks for graph-based semi-supervised classification. In: Proceedings of the 2018 World Wide Web Conference. 2018.
- 37.
Wang H, Zhou C, Chen X, Wu J, Pan S, Wang J. Graph stochastic neural networks for semi-supervised learning. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS ’20. Red Hook, NY, USA: Curran Associates Inc.; 2020.
- 38.
Lee DH. Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. 2013. https://api.semanticscholar.org/CorpusID:18507866
- 39.
Pham H, Dai Z, Xie Q, Le QV. Meta pseudo labels. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2021. p. 11557–68.
- 40. Berthelot D, Carlini N, Goodfellow I, Papernot N, Oliver A, Raffel CA. Mixmatch: a holistic approach to semi-supervised learning. Advances in Neural Information Processing Systems. 2019;32.
- 41.
Berthelot D, Carlini N, Cubuk ED, Kurakin A, Sohn K, Zhang H. Remixmatch: semi-supervised learning with distribution alignment and augmentation anchoring. arXiv preprint 2019. https://arxiv.org/abs/1911.09785
- 42. Sohn K, Berthelot D, Carlini N, Zhang Z, Zhang H, Raffel CA. Fixmatch: simplifying semi-supervised learning with consistency and confidence. Advances in Neural Information Processing Systems. 2020;33:596–608.
- 43. Xu Z, Wang Y, Lu D, Yu L, Yan J, Luo J, et al. All-around real label supervision: cyclic prototype consistency learning for semi-supervised medical image segmentation. IEEE J Biomed Health Inform. 2022;26(7):3174–84. pmid:35324450
- 44.
Hu X, Zeng D, Xu X, Shi Y. Semi-supervised contrastive learning for label-efficient medical image segmentation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021 : 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part II 24. Springer; 2021. p. 481–90.
- 45.
Wu H, Wang Z, Song Y, Yang L, Qin J. Cross-patch dense contrastive learning for semi-supervised segmentation of cellular nuclei in histopathologic images. In:2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2022. p. 11656–65.
- 46.
Basak H, Bhattacharya R, Hussain R, Chatterjee A. An exceedingly simple consistency regularization method for semi-supervised medical image segmentation. In: 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI). 2022. p. 1–4. https://doi.org/10.1109/isbi52829.2022.9761602
- 47.
Bai Y, Chen D, Li Q, Shen W, Wang Y. Bidirectional copy-paste for semi-supervised medical image segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2023. p. 11514–24.
- 48. Tarvainen A, Valpola H. Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. Advances in Neural Information Processing Systems. 2017;30.
- 49.
He K, Fan H, Wu Y, Xie S, Girshick R. Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2020. p. 9729–38.
- 50.
Chen T, Kornblith S, Norouzi M, Hinton G. A simple framework for contrastive learning of visual representations. In: International conference on machine learning. 2020. p. 1597–607.
- 51. Chen C, Dou Q, Chen H, Qin J, Heng P-A. Synergistic image and feature adaptation: towards cross-modality domain adaptation for medical image segmentation. AAAI. 2019;33(01):865–72.
- 52. Chaitanya K, Erdil E, Karani N, Konukoglu E. Contrastive learning of global and local features for medical image segmentation with limited annotations. Advances in Neural Information Processing Systems. 2020;33:12546–58.
- 53.
Yuan Y, Wang X, Yang X, Li R, Heng PA. Semi-supervised class imbalanced deep learning for cardiac MRI segmentation. Berlin, Heidelberg: Springer; 2023. p. 459–69.
- 54.
Bai W, Oktay O, Sinclair M, Suzuki H, Rajchl M, Tarroni G, et al. Semi-supervised learning for network-based cardiac MR image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention; 2017.
- 55.
Chaitanya K, Karani N, Baumgartner CF, Becker A, Donati O, Konukoglu E. Semi-supervised, task-driven data augmentation. In: Information Processing in Medical Imaging: 26th International Conference and IPMI 2019, Hong Kong, China, June 2–7, 2019, Proceedings 26. Springer; 2019. p. 29–41.
- 56. Chen L, Bentley P, Mori K, Misawa K, Fujiwara M, Rueckert D. Self-supervised learning for medical image analysis using image context restoration. Med Image Anal. 2019;58:101539. pmid:31374449
- 57.
Zeng D, Wu Y, Hu X, Xu X, Yuan H, Huang M, et al. Positional contrastive learning for volumetric medical image segmentation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021 : 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part II 24. Springer; 2021. p. 221–30.
- 58.
Zhang R, Liu S, Yu Y, Li G. Self-supervised correction learning for semi-supervised biomedical image segmentation. arXiv preprint 2023. arXiv:230104866
- 59.
Wu Y, Wu Z, Wu Q, Ge Z, Cai J. Exploring smoothness and class-separation for semi-supervised medical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer; 2022. p. 34–43.
- 60.
Chen F, Fei J, Chen Y, Huang C. Decoupled consistency for semi-supervised medical image segmentation. In: Greenspan H, Madabhushi A, Mousavi P, Salcudean S, Duncan J, Syeda-Mahmood T, et al., editors. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023 . Cham: Springer; 2023. p. 551–61.
- 61.
Miao J, Chen C, Liu F, Wei H, Heng P-A. CauSSL: causality-inspired semi-supervised learning for medical image segmentation. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV). 2023. p. 21369–80. https://doi.org/10.1109/iccv51070.2023.01959
- 62. Qiu Y, Meng J, Li B. Semi-supervised strong-teacher consistency learning for few-shot cardiac MRI image segmentation. Comput Methods Programs Biomed. 2025;261:108613. pmid:39893807
- 63. Guo H, Li Y, Wang X, He R, Quan J, Wang L, et al. Semi-supervised cardiac MRI image segmentation via learning consistency under differential perturbations. Digital Signal Processing. 2026;168:105494.
- 64.
Xu X, Wang T, Shi Y, Yuan H, Jia Q, Huang M, et al. Whole heart, great vessel segmentation in congenital heart disease using deep neural networks, graph matching. In: Shen D, Liu T, Peters TM, Staib LH, Essert C, Zhou S, et al., editors. Medical Image Computing and Computer Assisted Intervention – MICCAI 2019 . Cham: Springer; 2019. p. 477–85.
- 65.
Gidaris S, Singh P, Komodakis N. Unsupervised representation learning by predicting image rotations. In: International Conference on Learning Representations; 2018. https://openreview.net/forum?id=S1v4N2l0-
- 66.
Chaitanya K, Erdil E, Karani N, Konukoglu E. Contrastive learning of global and local features for medical image segmentation with limited annotations. arXiv preprint 2020. https://arxiv.org/abs/2006.10511