Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Source-free domain transfer algorithm with reduced style sensitivity for medical image segmentation

  • Jian Lin,

    Roles Conceptualization, Writing – original draft

    Affiliation Sichuan Academy of Medical Science and Sichuan Provincial People’s Hospital, Chengdu, China

  • Xiaomin Yu,

    Roles Data curation, Software

    Affiliation School of Electronic Engineering, Chengdu University of Information Technology, Chengdu, China

  • Zhengxian Wang,

    Roles Supervision, Validation

    Affiliation School of Electronic Engineering, Chengdu University of Information Technology, Chengdu, China

  • Chaoqiong Ma

    Roles Writing – review & editing

    machaoqiongcc@163.com

    Affiliation Sichuan Academy of Medical Science and Sichuan Provincial People’s Hospital, Chengdu, China

Abstract

In unsupervised transfer learning for medical image segmentation, where existing algorithms face the challenge of error propagation due to inaccessible source domain data. In response to this scenario, source-free domain transfer algorithm with reduced style sensitivity (SFDT-RSS) is designed. SFDT-RSS initially pre-trains the source domain model by using the generalization strategy and subsequently adapts the pre-trained model to target domain without accessing source data. Then, SFDT-RSS conducts interpatch style transfer (ISS) strategy, based on self-training with Transformer architecture, to minimize the pre-trained model’s style sensitivity, enhancing its generalization capability and reducing reliance on a single image style. Simultaneously, the global perception ability of the Transformer architecture enhances semantic representation to improve style generalization effectiveness. In the domain transfer phase, the proposed algorithm utilizes a model-agnostic adaptive confidence regulation (ACR) loss to adjust the source model. Experimental results on five publicly available datasets for unsupervised cross-domain organ segmentation demonstrate that compared to existing algorithms, SFDT-RSS achieves segmentation accuracy improvements of 2.83%, 2.64%, 3.21%, 3.01%, and 3.32% respectively.

1. Introduction

With the increasing stringency of privacy protection policies, pathology data of patients tends to be monopolized by large institutions such as hospitals and remains inaccessible to external parties [1]. Simultaneously, the large volume of medical image data results in significant resource consumption during storage, transmission, and data loading processes. These constraints make accessing medical image data increasingly challenging in practical healthcare scenarios [2,3]. Unsupervised transfer learning has gained favor among scholars due to its outstanding knowledge transfer capabilities and has found extensive applications in medical image segmentation to address inter-domain distribution differences. However, such algorithms often presume continuous access to source domain data throughout the entire model training process [4,5]. In actual scenarios, the inaccessibility of source domain data makes these methods difficult to apply.

Currently, some scholars have conducted research on knowledge transfer with no using data, known as source-free domain transfer problem [6]. Fig 1 illustrates the difference between traditional transfer idea and source-free domain transfer idea. Traditional unsupervised transfer leaning requires simultaneous availability of both source and target data and achieve inter-domain knowledge transfer through continuous optimization of the objective function. In contrast, source-free domain transfer methods first pre-train the segmentation model with data and subsequently rely solely utilize this model and target domain data to achieve knowledge transfer. Since the source domain data cannot be accessed, this implies that knowledge transfer across domains must be achieved indirectly, solely utilizing the obtained information [7].

thumbnail
Fig 1. Comparison of unsupervised domain transfer and source-free domain transfer.

https://doi.org/10.1371/journal.pone.0309118.g001

Current source-free domain transfer algorithms are divided to two types [8]. The first type relies on distribution simulation techniques, which replicate the distribution of source data by leveraging the source model and target data, thus compensating for the unavailability of source data. The second type employs statistical normalization techniques, which adjust the model automatically by altering the parameters of the batch normalization layer within the convolutional model.

Typically, information in medical images can be categorized into content information and style information. Content information represents the semantic content of the images, providing clues for knowledge transfer, while style information contributes to inter-domain distribution differences [9]. In scenarios without considering cross-data modalities, for different medical images of the same body part of a patient, their semantic information remains consistent, reflecting similar pathological information. Therefore, in cross-domain medical image segmentation, research is conducted along the lines of "transferring content information while avoiding style differences." For instance, conventional transfer learning approaches mitigate inter-domain stylistic discrepancies by employing methods like style conversion. This technique aligns target images with the style of source images, thereby alleviating domain disparities [10,11]. Nonetheless, current source-free domain transfer algorithms concentrate exclusively on leveraging the posterior distribution information. However, when confronted with cross-domain stylistic variations, these source-free domain transfer models experience a notable degradation in segmentation accuracy. Due to substantial style discrepancies between different domains, applying existing source-free transfer strategies to source models results in a continuous decline in segmentation accuracy during domain transfer, ultimately leading to irregular segmentation outcomes. This is because existing source-free transfer strategies, in scenarios where source domain data is inaccessible, cannot address style shifts as unsupervised transfer methods do. Instead, they solely focus on transferring content information, thereby confounding content and style information during domain transfer. Additionally, since existing source-free strategies rely on the initial predictions of the model for updates, significant initial prediction biases make it challenging for the model to update in the correct direction, trapping existing source-free strategies in the "error diffusion" dilemma. Therefore, addressing style shift issues in source-free scenarios holds significant research value [12,13].

To this end, we develop the SFDT-RSS to mitigate the impact of style shifts. SFDT-RSS initially trains the source model using a generalization approach, and subsequently adjusts the source model to the target without using source data. Then, SFDT-RSS conducts ISS strategy, based on self-training with Transformer architecture, to minimize the pre-trained model’s style sensitivity, enhancing its generalization capability and reducing reliance on a single image style. Simultaneously, the global perception ability of the Transformer architecture enhances semantic representation to improve style generalization effectiveness. In the domain transfer phase, the proposed algorithm utilizes a model-agnostic adaptive confidence regulation ACR loss to adjust the source model. Based on this, SFDT-RSS demonstrates more effective completion of source-free cross-domain medical image segmentation tasks.

The organization of this work is, Section 2 illustrates a review of the research status. Section 3 describes the design details. Section 4 conducts a comprehensive experimental evaluation. Section 5 concludes the work.

2. Related works

2.1 Unsupervised domain transfer algorithms

Contemporary unsupervised domain transfer algorithms for medical image segmentation commonly align the data distributions across domains within the feature space. Based on the implementation strategies, these algorithms can be categorized into instance-based weighting and representation learning methods. Opbroek et al. [14] proposed the reweighted support vector machine (RSVM) for lesion segmentation. RSVM iteratively updates the model, where the weights of misclassified samples are reduced to minimize their influence on model updates. Perone et al. [15] introduced a sample selection algorithm based on Knowledge Distillation for spinal cord gray matter segmentation, which utilizes exponential moving average strategy for parameter updating. During training, samples with high prediction consistency have a stronger impact on model updates. Instance-based weighting methods can demonstrate good performance under certain conditions. However, variations in the sample weight from different classes in the dataset may disrupt the balance of classes, altering the prior distribution information of the source domain, causing the classifier to consistently favor classes with higher total weights, thereby neglecting classes with lower weights.

Wang et al. [16] introduced an adversarial learning approach driven by boundaries and entropy for retinal image segmentation, generating more accurate boundary segmentation predictions similar to the source domain. Vu et al. [17] introduced the adversarial entropy minimization segmentation algorithm, which addresses distribution shift issues through pixel-level adversarial learning and entropy minimization. Zou et al. [18] studied the enhancement of confidence in pseudo-labels for the target domain through a self-supervised mechanism. They iteratively improved the confidence of pseudo-labels and retrained the model using these updated pseudo-labels in each iteration, thereby enhancing the adaptation of the target model. However, representation learning approaches assume that each sample in the source domain contributes equally to knowledge transfer. In practical data collection processes, random noise effects may cause samples to deviate extremely from the center of the data distribution in the source domain, treating these samples the same as other samples may lead to severe negative transfer, resulting in a decrease in model segmentation accuracy [18].

2.2 Source-free domain transfer algorithms

In real-world scenarios, the assumption that source domain medical image data is readily available is often invalid due to concerns regarding patient data privacy. As a result, traditional transfer algorithms become impractical. The concept of source-free domain transfer has garnered growing interest because it doesn’t rely on access to source data [19]. Examining existing approaches to source-free domain transfer for medical image segmentation, these methods can be broadly classified into two categories: those centered on simulating source domain distributions and those focused on statistically regularizing model parameters. Chen et al. [20] developed the denoised pseudo-labeling (DPL) source-free algorithm for retinal fundus image segmentation based on uncertainty and prototype estimation. DPL estimates pixel-level prediction uncertainty based on the self-supervised paradigm, identifies pseudo-labels with high uncertainty, and estimates category prototypes driven by prototype networks, calculating relative feature distances to pseudo-labels far from their corresponding class prototypes. Bateson et al. [21] utilizes inter-class proportion predictors to obtain posterior information of the source domain, minimizing unlabeled entropy loss defined on target domain data, and integrating it into the overall optimization objective of the model in the form of KL divergence. These methods tend to perform well when there is small distributional discrepancy. However, when the distributional discrepancy is significant, large errors may occur in both pseudo-labels and generated data, resulting in the problem of error diffusion and continuous decrease in model segmentation accuracy during training.

Statistical regularization-based source-free domain transfer methods adopt the opposite approach. These approaches adjust the parameters of batch normalization layers in convolutional models to align the target domain features with those of the source domain, thereby regularizing the target domain features. Liu et al. [22] proposed source-free transfer algorithm by using Adaptive Batch Statistics, which normalizes features of brain MRI images through low-order and high-order statistics. The algorithm highlights that imposing identical mean and variance across domains results in a reduction in model expressive capacity. Methods based on statistical regularization of model parameters require the use of convolutional neural network structures. Similarly, these methods may also lead to the problem of error diffusion.

3. Source-free domain transfer algorithm

3.1 Algorithm overview

Many symbols and abbreviations are used in this paper, which is presented in Table 1. The proposed framework for medical image segmentation is illustrated in Fig 2. Compared to existing source-free domain transfer algorithms, this algorithm focuses on mitigating the adverse impact of style variations to prevent error diffusion issues. SFDT-RSS adopts a strategy of generalization before transfer. Initially, it mitigates the source domain model’s sensitivity to style variations through the ISS mechanism, thus improving the model’s generalization capability and reducing its dependence on a singular style. Subsequently, in the absence of source data, the algorithm employs the ACR loss to gauge pixel correlation, adjusting the source model to the target by heightening classification certainty. This approach further mitigates the risk of misclassification stemming from style discrepancies, ultimately facilitating knowledge transfer in a source-free context.

thumbnail
Fig 2. Medical image segmentation workflow of the proposed algorithm SFDT-RSS.

https://doi.org/10.1371/journal.pone.0309118.g002

thumbnail
Table 1. Symbols and abbreviations in this paper.

https://doi.org/10.1371/journal.pone.0309118.t001

Here, the problem is defined. Firstly, a source domain model Ms:Xs→Ys is pre-trained to classify each pixel of medical images. When transfer, source domain Xs∈ℝH×W×1 and its corresponding labels Ys∈ℝH×W×C are not available. The research objective of the proposed algorithm is to accurately identify the unknown target domain Xt∈ℝH×W×1 without accessing domains Xs and Ys.

3.2 Interpatch style transfer strategy

3.2.1 Intra-image transfer.

Prior to network input, medical images are divided into K equally-sized and non-overlapping image patches using the Transformer’s patch processing approach. These patches serve as the network’s input. Here, represents the kth patch partitioned from image xn, and denotes a stochastic boosting operator. Stochastic boosting is applied to each patch contained in the image to obtain an augmented patch set, as shown below: (1)

Then, raw patchset Pn and augmentation patchset are input to network. The encoder g extracts feature from the input patches, denoted as and , respectively. The following objective function is used as a constraint, to guarantee consistent feature distillation: (2)

In this equation, N denotes the number of images processed simultaneously and ϕ>0 serves as a weighting factor for constraint Φ(θ).

Although the source domain model enhances model generalization by extracting style-invariant features, it cannot guarantee the presence of effective semantic information in the features extracted by the encoder g. Hence, ISS generates random combinations of raw patches and augmented patches while preserving the order of the original patches.

(3)

In the equation, M denotes the random combinations’ number, m stands for the mth random combination, r denotes the image reconstruction sub-network, takes a probability of 1/3 for each value, and fbg represents the features of patches with black backgrounds.

3.2.2 Inter-image transfer.

According to previous work [22], low-order feature statistics have domain specificity due to differences in distribution representation. Moreover, using μ and σ as style factors for semantic segmentation has been proven feasible in previous work [23]. Given image pairs (xa,xb), ISS randomly selects patches from Pa and Pb for all positions. The selected patches are arranged in the original order, i.e., , where is randomly sampled with a probability of 1/2. Each image pair (xa,xb) is randomly selected from the current batch of images, and xaxb. Since there are inherent style differences between medical images, no additional image augmentation is required for inter-image generalization. Based on this, encoder g can distill patch features . Subsequently, style factors μ and σ can separately calculate, and then style factors are exchanged and normalized between these features. The exchange process and normalization can be expressed as follows: (5) where μj and σj are the style factors of patch features , and represents the result after exchange and normalization. The next step involves acquiring M random feature combinations for each batch of images to facilitate subsequent image reconstruction. Here, Eq (3) is reformulated accordingly: (6) where a and b represent the indices of image pairs randomly selected from [1, N], where the indices follow ab.

Based on this, the optimization objective function for inter-image transfer is expressed by: (7)

3.2.3 Overall optimization objective of cross-patch style generalization and fine-tuning segmentation sub-network.

The overall optimization objective of reducing the style sensitivity of the source domain model can be represented as the superposition of optimization objectives: (8)

After reducing the style sensitivity through intra-image and inter-image transfer, a sub-network s requires training to facilitate medical image segmentation. Thus, the sub-network s undergoes fine-tuning on the source domain, incorporating original and augmented features, leveraging cross-entropy and Dice loss functions. Additionally, since it is expected that the encoder g dominates the entire fine-tuning process, the learning rate of r should significantly less than that of g. The optimization objective of the segmentation sub-network s is: (9)

3.3 Adaptive confidence regularization

After the pre-training phase, SFDT-RSS initializes a new target model utilizing the pre-trained model. To further reduce the impact of style shift, we propose ACR loss to mitigate classification errors and address overfitting by employing dynamic scaling mechanism.

3.3.1 Confidence maximization.

The ACR loss reduces classification confusion by maximizing classification confidence. The correlation between any two classes u and v is defined as: (10) where represents the probability of each pixel in the image belonging to class u. Correlation ruv signifies the likelihood belonging to both u and v, and confidence could be estimated as: (11)

The logits center for u is computed by: (12) where Ziu represents the center of logits for the ith pixel output of class u. Based on this, the representation of adaptive weights is in the form of a diagonal matrix: (13)

Thus, the confidence described in Eq (11) can be re-expressed as: (14)

The objective of the ACR loss is to enhance discernibility, leading to clearer decision boundaries and less classification confusion: (15)

Since high confidence predictions with errors only have a minor impact, Eq (15) can be selected as the confidence indicator to prevent error diffusion.

Algorithm 1 SFDT-RSS workflow

Data: Source domain medical images: , target domain medical images: .

Results: Target domain model Mt.

1. Pre-train source domain model Ms;

2. Divide source domain medical images xsXs into non-overlapping blocks;

3. Perform data augmentation on original blocks using random enhancement operator ;

4. while reconstruction sub-network r not converged do

5. Extract features of original blocks and augmented blocks via encoder g;

6 Enhance model’s generalization ability using ℒcpsg through intra-image and inter-image generalization;

7. end

8. Fine-tune segmentation sub-network using cross-entropy and Dice losses to obtain Ms;

9. Adjust source domain model to obtain target domain model Mt (without using source domain data):

10. Initialize target domain model Ms with source domain model Mt;

11. while Target domain model Mt not converged do

12. Calculate loss ℒacr based on model confidence and optimize model;

13. Employ dynamic scaling strategy to scale model output, avoiding overfitting;

14. end

Output: Converged target domain model Mt.

3.3.2 Dynamic scaling mechanism.

Earlier research [24] has demonstrated method often exhibit overfitting to the operating domain, resulting in a tendency for model predictions to be uniformly distributed. In contrast to previous work [25] using fixed temperature coefficients for rescaling, the ACR loss adopts an adaptive regularization strategy. The scaling factor sets as 1 in the original training process, the dynamic regularization strategy does not take effect. When confidence is high, the impact of the scaling factor on model predictions can be ignored.

(16)

3.4 Algorithm summary

Through the training strategies outlined above, the problem of error diffusion in the source-free scenario can be effectively mitigated. This section outlines the training procedure of SFDT-RSS, which consists of two distinct stages: pre-training and domain transfer. The first phase focuses mainly on diminishing the dependency of the source domain model on a sole image style while bolstering the model’s capacity for generalization. The second phase is responsible for cross-domain knowledge transfer, further reducing the impact of style shift by reducing classification confusion. The procedure of SFDT-RSS is outlined in Algorithm 1.

In the pre-training phase, after data augmentation by the random enhancement operator , deep features are extracted by the encoder g from the preprocessed source domain medical images. Based on the image reconstruction task and the reconstruction sub-network r, the model’s generalization ability is enhanced through intra-image and inter-image transfer mechanisms using ℒcpsg. Then, the segmentation sub-network s is fine-tuned using a combination loss ℒseg. This yields a source domain model with generalization capability across various stylistic contexts.

During the domain transfer phase, the target domain model is established by initializing it with the source domain model, after which this model is employed to predict segmentation on medical images from the target domain. Benefiting from the improved model generalization ability in pre-training, the predicted results of this model have high confidence. Then, based on these predicted results, the classification confidence is calculated, and the model is gradually corrected through optimization objective ℒacr to maximize this confidence, thereby enhancing discernibility. Simultaneously, a dynamic regularization mechanism is employed to adaptively scale the model’s predicted results, preventing overfitting and enhancing the final model’s segmentation performance.

4. Experimental validation

4.1 Dataset and preprocessing

To assess SFDT-RSS in the source-free cross-domain scenario for medical image segmentation, simulations and analyses were conducted on publicly available liver segmentation datasets LiTS [26] and Synapse [27], as well as retinal datasets REFUGE [28], Drishti-GS [29], and RIM-ONE-r3 [30]. The dataset information is summarized in Table 2.

thumbnail
Table 2. Information on the datasets used in this study.

https://doi.org/10.1371/journal.pone.0309118.t002

The LiTS benchmark dataset includes 131 abdominal CT images, 107 of which contain lesions. This study focuses solely on liver segmentation. The 131 scans were sliced to obtain 2D images, resulting in a total of 58,638 images. Each image was resized to 224 × 224 pixels, with pixel values scaled to [0, 255] to obtain grayscale images for liver segmentation. The dataset information is summarized in Table 1.

The Synapse multi-organ segmentation dataset consists of 50 abdominal CT images, randomly selected from scans of ongoing colorectal cancer chemotherapy trials and abdominal hernia studies. In this study, 30 abdominal CT images were chosen. Each CT image comprised 85 to 198 slices of 224 × 224 pixels, resulting in a total of 3,779 abdominal clinical slices.

The REFUGE dataset contains 1,200 color fundus photographs (CFP) stored in JPEG format, each 8-bit per color channel, collected by ophthalmologists or technicians from upright seated patients. In this study, the original images were independently cropped multiple times and processed into 224 × 224 grayscale images, totaling 6,000 fundus images for optic disc and cup segmentation experiments.

The Drishti-GS dataset consists of retinal images manually annotated at the pixel level by multiple experts. In the released dataset, the fundus region was extracted by removing the non-fundus mask areas from the original images, resulting in fundus images with a resolution of 2047 × 1760 pixels. In this study, the dataset was processed similarly, and due to data limitations, it was used solely as the target domain.

The RIM-ONE-r3 dataset comprises retinal images, all captured under specific flash intensity settings using a Nidek AFC-210 fundus camera with a resolution of 21.1 megapixels. In this study, the dataset was used solely as the target domain.

4.2 Experimental design

To demonstrate the effectiveness of SFDT-RSS, this study compared it with three algorithms: the best existing supervised learning method, traditional transfer methods, and source-free transfer methods. The experiments encompassed liver segmentation as well as retinal optic disc and cup segmentation using the aforementioned datasets. Each experiment focused solely on the segmentation annotation of one category of pixels, i.e., binary image segmentation. The tasks LiTS → Synapse were used for liver segmentation validation, while tasks REFUGE → RIM-ONE-r3 and REFUGE → Drishti-GS were used for retinal optic disc and cup segmentation validation. The specifics of the three comparison methods are as follows:

Comparison with supervised learning methods: This aims to showcase the best results achievable by existing deep learning algorithms and analyze the performance loss in the source-free cross-domain scenario. During training, supervised models were trained by accessing both target domain data and their annotations. It’s worth noting that the source-free domain transfer model proposed in this paper only requires access to target domain data during training, without accessing source domain data or target domain annotations.

Comparison with traditional transfer methods: Traditional unsupervised domain transfer algorithms require access to source data and corresponding labels, and unlabeled target data during training. Comparing with such methods aims to evaluate model performance when source data is unavailable.

Comparison with source-free domain transfer Methods: The proposed SFDT-RSS belongs to this category of methods, where only the source domain model and target domain data can be utilized in the domain transfer phase, without accessing source domain data. Comparing with similar methods aims to more intuitively evaluate model performance.

4.3 Compared algorithms

To verify the effectiveness of SFDT-RSS in addressing the source-free cross-domain medical image segmentation problem, various studies in medical image segmentation were selected as benchmarks based on the three comparison paradigms described in the previous section.

4.3.1 Supervised learning method.

TransUNet [27]: Selected as a performance benchmark to analyze the performance loss caused by limiting factors in various scenarios.

4.3.2 Traditional domain transfer method.

AJTDA [31]: Compared to analyze the performance loss when source domain data cannot be accessed.

4.3.3 Source-free domain transfer method.

SFUDA [32]: Align the data distribution based on uncertainty and prior distribution perception strategies.

UBNA [33]: By incorporating an exponentially decaying momentum factor, the method adjusts a portion of the normalization layer statistics to match the target domain, thus facilitating model transfer.

SRDA [21]: Guides model transfer by using domain-invariant prior knowledge and minimizes the unlabeled entropy loss defined on target data.

DAE [34]: Adjusts target domain images through an image normalization submodule and explores implicit priors in prediction results, then models these implicit priors using independently trained denoising autoencoders.

DPL [20]: Utilizing two complementary denoising schemes at the pixel-level and class-level, which involve uncertainty and prototype estimation, respectively.

4.4 Analysis of experimental results

This section presents the relevant experimental results, including source-free cross-domain segmentation results of liver, optic disc, and cup in medical images, along with corresponding ablation experiments, computational complexity analysis, and parameter sensitivity analysis [35]. Given the relatively small dataset size, a lightweight ViT configuration was used in this experiment, with embedding dimension set to 192, embedding layers set to 4, and embedding heads set to 4. For the random augmentation operator , the style randomization method from [36] was adopted, allowing an unlimited number of random style augmentation choices. The AdamW optimizer was used for model training with a learning rate of 1e−4 and a weight decay rate of 1e−3. The sensitivity of model parameters will be analyzed in detail in subsequent sections.

4.4.1 Comparative analysis of source-free cross-domain segmentation experiment results.

To assess the efficiency of SFDT-RSS for solving the source-free cross-domain medical image segmentation problem, this section presents the comparison results and discussions of our algorithm with supervised learning, traditional domain transfer, and source-free domain transfer methods. Firstly, the experiment precision of the proposed algorithm and comparison algorithms in terms of Dice coefficient and Average Surface Distance (ASD) for each task is reported, along with the standard deviation of experimental accuracy for each task. Then, the experimental results is provided. The segmentation accuracy of the model is positively correlated with the Dice coefficient and negatively correlated with the ASD metric. TransUNet, as the best existing supervised segmentation model, is used to demonstrate the upper bound of performance achievable by deep learning models on the corresponding segmentation tasks.

(1) LiTS → Synapse. Table 3 presents the accuracy comparison of liver segmentation on LiTS → Synapse task. A higher Dice coefficient and lower ASD indicate better segmentation performance. The Dice coefficient achieved by the supervised learning method TransUNet is 96.66%, with an ASD metric of 10.87, representing the highest performance achievable by existing deep learning models. The Dice coefficient of SFDT-RSS proposed in this paper is 1.84% lower than TransUNet, with an ASD metric 1.07 higher, indicating that SFDT-RSS achieves segmentation results close to the best existing supervised learning method in terms of both pixel overlap and contour similarity, effectively demonstrating the validity of our algorithm. Compared with traditional unsupervised domain transfer methods, SFDT-RSS is 0.40% lower in Dice coefficient than the AJTDA algorithm; for the ASD metric, SFDT-RSS is close to AJTDA, only 0.05 behind. This is mainly due to the inability of the proposed algorithm to access source domain data during training. Compared with source-free domain transfer methods, SFDT-RSS has significant performance advantages. In terms of the Dice coefficient, SFDT-RSS is 6.42%, 5.14%, 5.61%, 3.49%, and 2.83% higher than SFUDA, UBNA, SRDA, DAE, and DPL, respectively. For the ASD metric, SFDT-RSS is 1.31, 4.38, 2.82, 4.80, and 2.70 lower than existing methods, respectively, indicating that the proposed algorithm is superior to existing source-free domain transfer methods. This is because SFDT-RSS addresses the issue of error diffusion caused by style shift. Additionally, it can be observed that SFDT-RSS has a smaller standard deviation, indicating excellent stability of our algorithm.

thumbnail
Table 3. Performance comparison of all methods on the LiTS → Synapse task.

https://doi.org/10.1371/journal.pone.0309118.t003

(2) REFUGE → RIM-ONE-r3. Table 4 shows the accuracy comparison of optic disc and cup segmentation experiments on the REFUGE → RIM-ONE-r3 task. Similarly, TransUNet indicates the best performance achievable by deep learning models on the target domain dataset. Its Dice coefficient for optic disc segmentation task is 97.83% with an ASD metric of 8.63; for the cup segmentation task, the Dice coefficient is 88.60% with an ASD metric of 8.52. Compared with TransUNet, SFDT-RSS exhibits some gaps. This indicates that extracting source domain data distribution information and conducting knowledge transfer for this task in the source-free cross-domain scenario is quite challenging. However, from the table, it can be seen that, compared to existing traditional domain transfer and source-free domain transfer methods, SFDT-RSS still has significant performance advantages.

thumbnail
Table 4. Performance comparison on REFUGE → RIM-ONE-r3 task.

https://doi.org/10.1371/journal.pone.0309118.t004

Compared with traditional unsupervised domain transfer methods, for the optic disc segmentation task, the Dice coefficient of SFDT-RSS is 1.33% lower than the AJTDA algorithm, with an ASD metric 0.07 higher, indicating that the two have very similar segmentation results for the optic disc. For the cup segmentation task, the Dice coefficient of SFDT-RSS is 2.37% lower than the AJTDA algorithm, with an ASD metric 3.28 higher, indicating that the restriction of not being able to access source domain data in the source-free scenario has a greater negative impact on cup segmentation. This is mainly because cup images are greatly influenced by surrounding tissues, making them more difficult to segment than the optic disc. Compared with source-free domain transfer methods, the Dice coefficient of SFDT-RSS is 6.96%, 6.90%, 3.99%, 4.31%, and 2.64% higher than SFUDA, UBNA, SRDA, DAE, and DPL, respectively. For the ASD metric, SFDT-RSS is 9.51, 4.64, 4.77, 2.56, and 0.54 lower, respectively; for the cup segmentation task, the Dice coefficient of SFDT-RSS is 8.50%, 6.45%, 4.08%, 4.15%, and 3.21% higher than SFUDA, UBNA, SRDA, DAE, and DPL, respectively. For the ASD metric, SFDT-RSS is 4.25, 4.27, 2.84, 3.28, and 2.45 lower, respectively. These data indicate that SFDT-RSS has a significant advantage over existing source-free domain transfer methods for this task.

(3) REFUGE → Drishti-GS task. Table 5 presents the accuracy comparison of optic disc and cup segmentation experiments on the REFUGE → Drishti-GS task. TransUNet achieved Dice coefficients of 98.40% and 92.33%, with ASD metrics of 4.55 and 8.56 for optic disc and cup segmentation tasks, respectively. For the optic disc segmentation task, SFDT-RSS has a Dice coefficient 1.54% lower and an ASD metric 1.47 higher than TransUNet; for cup segmentation, its Dice coefficient is 6.37% lower and ASD metric 2.64 higher than TransUNet. These data indicate that SFDT-RSS performs close to supervised learning methods in this task without accessing source data, effectively demonstrating the validity of SFDT-RSS.

thumbnail
Table 5. Performance comparison of methods on REFUGE → Drishti-GS task.

https://doi.org/10.1371/journal.pone.0309118.t005

Compared with traditional unsupervised domain transfer methods, for the optic disc segmentation task, SFDT-RSS has a Dice coefficient 0.15% lower and an ASD metric 1.80 higher than the AJTDA algorithm; for the cup segmentation task, the Dice coefficient of SFDT-RSS is 3.05% lower and the ASD metric 0.57 higher than AJTDA. Similar to the REFUGE → RIM-ONE-r3 task, the results indicate that both methods have similar segmentation results for the optic disc, and the limitation of not being able to access source domain data in the source-free scenario has a greater negative impact on optic disc segmentation.

Compared with source-free domain transfer methods, SFDT-RSS exhibits the best performance in optic disc and cup segmentation. Compared to the suboptimal method DPL, SFDT-RSS has a Dice coefficient 3.01% higher and an ASD metric 1.12 lower in optic disc; in cup, the Dice coefficient is 3.32% higher and ASD metric 0.97 lower. Compared to other source-free domain transfer methods, SFDT-RSS has a greater performance advantage. Thus, SFDT-RSS outperforms existing source-free domain transfer methods on this task.

(4) Visualization results analysis. Figs 3 and 4 respectively show the visual segmentation results of liver, optic disc (top), and cup (bottom), including a comparison of segmentation results between SFDT-RSS and existing source-free domain transfer methods. The liver segmentation result image is taken from the LiTS → Synapse task, while the optic disc and cup segmentation results are taken from the REFUGE → Drishti-GS task. The leftmost image represents the target domain test data, the rightmost image shows the corresponding pixel-level annotations, and the remaining images display segmentation results of various methods arranged from left to right in ascending order of segmentation accuracy. It can be observed from the figures that for the liver segmentation task, the segmentation effect of the proposed algorithm is significantly better in detail compared to other methods. For example, compared to the suboptimal method DPL, SFDT-RSS is more effective in segmenting image edges, with shapes closer to the original data annotations and no misclassification of pixels outside the liver. For the optic disc and cup segmentation tasks, the presence of large pixel value transitions in the original images poses great difficulties for accurate segmentation by the model, resulting in less-than-ideal segmentation results for all methods. However, compared to other methods, the segmentation results of SFDT-RSS are closer to the ground truth annotations, demonstrating the performance advantage.

thumbnail
Fig 3. Results of liver segmentation task LiTS → Synapse

https://doi.org/10.1371/journal.pone.0309118.g003

thumbnail
Fig 4.

Results of Optic Disc (Top) and Cup (Bottom) Segmentation Task REFUGE → Drishti-GS.

https://doi.org/10.1371/journal.pone.0309118.g004

4.4.2 Sensitivity analysis.

Ablation experiments are designed to evaluate SFDT-RSS, including comparisons of performance gains with the source-free strategy, the ISS module within SFDT-RSS, the ACR loss, and the performance gains with random masks. Additionally, as the ACR loss is model-agnostic, we combine it with other existing source-free domain transfer methods to further evaluate its effectiveness.

(1) Performance gain analysis of the source-free domain transfer strategy. To showcase the effectiveness of SFDT-RSS, we conduct a comparative analysis with baseline models across various tasks to elucidate the performance enhancements facilitated by the source-free strategy embedded in SFDT-RSS. The baseline model is derived from the ViT backbone network in SFDT-RSS, trained without the ISS mechanism and ACR loss during training, and then tested on the target domain. Compared with baseline model, the source-free domain transfer strategy in SFDT-RSS exhibits significant performance improvements across all tasks. For the LiTS → Synapse liver segmentation task, there is a 21.43% increase in Dice coefficient. For the REFUGE → RIM-ONE-r3 fundus segmentation task, there are improvements in optic disc segmentation Dice coefficient by 18.72% and cup segmentation Dice coefficient by 20.68%. Similarly, for the REFUGE → Drishti-GS fundus segmentation task, there are improvements in optic disc segmentation Dice coefficient by 20.44% and cup segmentation Dice coefficient by 19.57%. This is mainly attributed to the ISS mechanism in SFDT-RSS, which enhances the model’s generalization ability during source domain model training, reducing the sensitivity of the source domain model to image styles, thus improving the initial segmentation accuracy on the target domain medical images, to some extent, avoiding the problem of error diffusion. Based on this, SFDT-RSS adapts the source domain model to the target domain using the ACR loss, further addressing the problem of error diffusion by reducing the model’s tendency for misclassification, thereby enhancing the segmentation accuracy of medical images in source-free cross-domain scenarios.

(2) Evaluation of the ISS module. In this section, we verify the effectiveness of the cross-patch style generalization mechanism for liver segmentation tasks. As shown in Table 6, severe performance degradation occurs when SFDT-RSS does not use either intra-image or inter-image generalization. Specifically, without intra-image generalization, the Dice coefficient decreases by 1.64% and the ASD metric increases by 3.08. Similarly, without inter-image generalization, the Dice coefficient decreases by 1.39% and the ASD metric increases by 2.27. When both are disabled, the model’s performance loss is maximized, with a Dice coefficient decrease of 5.85% and an ASD metric increase of 9.37.

thumbnail
Table 6. Performance gain of the source strategy in SFDT-RSS (Dice).

https://doi.org/10.1371/journal.pone.0309118.t006

(3) Evaluation of the ACR loss. As shown in Table 7, not using the ACR loss when using the ISS mechanism results in performance degradation, with a Dice coefficient decrease of 1.50% and an ASD metric increase of 0.42, fully demonstrating the effectiveness of the ACR loss. As the ACR loss is model-agnostic, we further test its performance by adding it to other source-free domain transfer strategies. As shown in Table 6, adding the ACR loss to both supervised learning method TransUNet and source-free domain transfer method DPL results in varying degrees of performance improvement. Specifically, for the Dice coefficient, the performance improves by 1.05% and 0.85%, and for the ASD metric, it decreases by 1.85 and 1.62, respectively.

thumbnail
Table 7. Performance evaluation of framework modules for liver segmentation tasks.

https://doi.org/10.1371/journal.pone.0309118.t007

(4) Evaluation of random masking. The random mask strategy is used solely for intra-image generalization by reconstructing masked images to enhance the model’s learning ability. As shown in Table 8, using random masking increases the Dice coefficient by approximately 1.2% compared to not using it. This performance gain mainly stems from the generalization effect obtained from the self-supervised image reconstruction task, fully demonstrating the effectiveness of random masking.

thumbnail
Table 8. Testing the performance improvement of ACR loss on other source-free strategies for liver segmentation tasks.

https://doi.org/10.1371/journal.pone.0309118.t008

4.4.3 Sensitivity analysis of parameters

The impact of parameters M and K is discussed in the part. By exploring different values for these parameters, we aim to identify the optimal combination to ensure the model achieves the best segmentation performance.

Parameter M controls the frequency of image reconstruction within the ISS module. Taking the liver segmentation task LiTS → Synapse as an example, we sequentially assign values from the set {1, 2,…, 10} to M, and the results for each value are depicted in Fig 5. It can be observed that, with K = 162, the Dice coefficient shows a positive correlation with M, while the ASD metric exhibits a negative correlation. As M increases, both the Dice coefficient and ASD metric tend to converge. However, due to the significant time consumption of the image reconstruction module during training and the limited performance improvement, we set M = 8 as the default value in our experiments to balance training costs and model performance.

thumbnail
Fig 5. Sensitivity analysis of parameters M and K.

https://doi.org/10.1371/journal.pone.0309118.g005

Parameter K governs the quantity and size of blocks for medical image partitioning. To ensure each block has the same size, we sequentially assign values from the set {1122, 322, 162, 82} to K. In other words, for an input image size of 224 × 224, each block corresponds to sizes {22, 72, 142, 282} respectively. The sensitivity of parameter K is visualized in Fig 5, showing that when K = 162, the Dice coefficient reaches its maximum value, while the ASD metric achieves its minimum. This is because smaller block sizes are suitable for low-resolution images, whereas larger block sizes may lead to insufficient semantic representation. Hence, we adopt K = 162 as the default setting to ensure optimal model performance.

5. Conclusions

This paper addresses the issue of error diffusion caused by style shift in existing unsupervised domain transfer algorithms for medical image segmentation, which occurs when knowledge transfer from the source domain data is not accessible. We propose a source-free domain transfer algorithm for medical image segmentation based on reducing style sensitivity. This algorithm enhances the generalization ability of the source domain model through a pre-training strategy, enabling the model to break free from reliance on a single image style. Subsequently, it employs an ACR strategy to reduce the model’s misclassification probability, further addressing the error diffusion issue and improving segmentation accuracy. Experimental results demonstrate that our proposed method, SFDT-RSS, achieves significant improvements in Dice coefficients for unsupervised cross-domain liver, optic disc, and optic cup segmentation tasks on five publicly available datasets: by 2.83%, 2.64%, 3.21%, 3.01%, and 3.32% respectively. These findings underscore the effectiveness of the SFDT-RSS in source-free cross-domain medical image segmentation tasks.

References

  1. 1. Su R, Liu J, Zhang D, et al. Multimodal glioma image segmentation using dual encoder structure and channel spatial attention block[J]. Frontiers in Neuroscience, 2020, 14: 586197.
  2. 2. Hu M, Zhong Y, Xie S, et al. Fuzzy system based medical image processing for brain disease prediction[J]. Frontiers in Neuroscience, 2021, 15: 714318.
  3. 3. Zhang Y, Zhou T, Wang S, et al. Input augmentation with sam: Boosting medical image segmentation with segmentation foundation model[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2023: 129–139.
  4. 4. Shin H, Kim H, Kim S, et al. SDC-UDA: volumetric unsupervised domain transfer framework for slice-direction continuous cross-modality medical image segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 7412–7421.
  5. 5. Guan H, Liu M. DomainATM: domain transfer toolbox for medical data analysis[J]. NeuroImage, 2023, 268: 119863.
  6. 6. Zhao K.; Liu Z.; Zhao B.; Shao H. Class-Aware Adversarial Multiwavelet Convolutional Neural Network for Cross-Domain Fault Diagnosis. IEEE Transactions Industrial Informatics, 2024, 20, 4492–4503.
  7. 7. Yu Q, Xi N, Yuan J, et al. Source-Free Domain Transfer for Medical Image Segmentation via Prototype-Anchored Feature Alignment and Contrastive Learning[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2023: 3–12.
  8. 8. Li L, Zhou Y, Yang G. Robust source-free domain transfer for fundus image segmentation[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2024: 7840–7849.
  9. 9. Yang C, Liu Y, Yuan Y. Transferability-Guided Multi-source Model Transfer for Medical Image Segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2023: 703–712.
  10. 10. Liu X, Xing F, El Fakhri G, et al. Memory consistent unsupervised off-the-shelf model transfer for source-relaxed medical image segmentation[J]. Medical image analysis, 2023, 83: 102641.
  11. 11. Zhao K, Liu Z, Li J, et al. Self-paced decentralized federated transfer framework for rotating machinery fault diagnosis with multiple domains[J]. Mechanical Systems and Signal Processing, 2024, 211: 111258.
  12. 12. Huang Y, Xie W, Li M, et al. Source-free domain adaptive segmentation with class-balanced complementary self-training[J]. Artificial Intelligence in Medicine, 2023, 146: 102694.
  13. 13. Fang Y, Yap P T, Lin W, et al. Source-free unsupervised domain transfer: A survey[J]. Neural Networks, 2024: 106230.
  14. 14. Van Opbroek A, Ikram M A, Vernooij M W, et al. Transfer learning improves supervised image segmentation across imaging protocols [J]. IEEE transactions on medical imaging, 2014, 34(5): 1018–1030.
  15. 15. Perone C S, Calabrese E, Cohen-Adad J. Spinal cord gray matter segmentation using deep dilated convolutions[J]. Scientific reports, 2018, 8(1): 5966.
  16. 16. Wang S, Yu L, Li K, et al. Boundary and entropy-driven adversarial learning for fundus image segmentation[C]//Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part I 22. Springer International Publishing, 2019: 102–110.
  17. 17. Vu T H, Jain H, Bucher M, et al. Advent: Adversarial entropy minimization for domain transfer in semantic segmentation[C]// Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 2517–2526.
  18. 18. Zou Y, Yu Z, Kumar B V K, et al. Unsupervised domain transfer for semantic segmentation via class-balanced self-training[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 289–305.
  19. 19. Guan H, Liu M. Domain transfer for medical image analysis: a survey[J]. IEEE Transactions on Biomedical Engineering, 2021, 69(3): 1173–1185.
  20. 20. Chen C, Liu Q, Jin Y, et al. Source-free domain adaptive fundus image segmentation with denoised pseudo-labeling[C]//Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part V 24. Springer International Publishing, 2021: 225–235.
  21. 21. Bateson M, Kervadec H, Dolz J, et al. Source-relaxed domain transfer for image segmentation[C]//Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part I 23. Springer International Publishing, 2020: 490–499.
  22. 22. Liu X, Xing F, Yang C, et al. Adapting off-the-shelf source segmenter for target medical image segmentation[C]//Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part II 24. Springer International Publishing, 2021: 549–559.
  23. 23. Zhao Y, Zhong Z, Luo Z, et al. Source-free open compound domain transfer in semantic segmentation[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(10): 7019–7032.
  24. 24. Guo C, Pleiss G, Sun Y, et al. On calibration of modern neural networks[C]//International conference on machine learning. PMLR, 2017: 1321–1330.
  25. 25. Jin Y, Wang X, Long M, et al. Minimum class confusion for versatile domain transfer[C]//Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI 16. Springer International Publishing, 2020: 464–480.
  26. 26. Bilic P, Christ P, Li H B, et al. The liver tumor segmentation benchmark (lits)[J]. Medical Image Analysis, 2023, 84: 102680.
  27. 27. Chen J, Lu Y, Yu Q, et al. Transunet: Transformers make strong encoders for medical image segmentation[J]. arXiv preprint arXiv:2102.04306, 2021.
  28. 28. Orlando J I, Fu H, Breda J B, et al. Refuge challenge: A unified framework for evaluating automated methods for glaucoma assessment from fundus photographs[J]. Medical image analysis, 2020, 59: 101570.
  29. 29. Fumero F, Alayón S, Sanchez J L, et al. RIM-ONE: An open retinal image database for optic nerve evaluation[C]//2011 24th international symposium on computer-based medical systems (CBMS). IEEE, 2011: 1–6.
  30. 30. Sivaswamy J, Krishnadas S, Chakravarty A, et al. A comprehensive retinal image dataset for the assessment of glaucoma from the optic nerve head analysis[J]. JSM Biomedical Imaging Data Papers, 2015, 2(1): 1004.
  31. 31. Jiang J, Hu Y C, Tyagi N, et al. PSIGAN: Joint probabilistic segmentation and image distribution matching for unpaired cross-modality transfer-based MRI segmentation[J]. IEEE transactions on medical imaging, 2020, 39(12): 4071–4084.
  32. 32. Ye M, Zhang J, Ouyang J, et al. Source data-free unsupervised domain transfer for semantic segmentation[C]//Proceedings of the 29th ACM international conference on multimedia. 2021: 2233–2242.
  33. 33. Klingner M, Termöhlen J A, Ritterbach J, et al. Unsupervised batchnorm transfer (ubna): A domain transfer method for semantic segmentation without using source domain representations[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2022: 210–220.
  34. 34. Karani N, Erdil E, Chaitanya K, et al. Test-time adaptable neural networks for robust medical image segmentation[J]. Medical Image Analysis, 2021, 68: 101907.
  35. 35. Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.
  36. 36. Jackson P T G, Abarghouei A A, Bonner S, et al. Style augmentation: data augmentation via style randomization[C]//CVPR workshops. 2019, 6: 10–11.