A weak edge estimation based multi-task neural network for OCT segmentation

Fan Yang; Pu Chen; Shiqi Lin; Tianming Zhan; Xunning Hong; Yunjie Chen

doi:10.1371/journal.pone.0316089

Abstract

Optical Coherence Tomography (OCT) offers high-resolution images of the eye’s fundus. This enables thorough analysis of retinal health by doctors, providing a solid basis for diagnosis and treatment. With the development of deep learning, deep learning-based methods are becoming more popular for fundus OCT image segmentation. Yet, these methods still encounter two primary challenges. Firstly, deep learning methods are sensitive to weak edges. Secondly, the high cost of annotating medical image data results in a lack of labeled data, leading to overfitting during model training. To tackle these challenges, we introduce the Multi-Task Attention Mechanism Network with Pruning (MTAMNP), consisting of a segmentation branch and a boundary regression branch. The boundary regression branch utilizes an adaptive weighted loss function derived from the Truncated Signed Distance Function(TSDF), improving the model’s capacity to preserve weak edge details. The Spatial Attention Based Dual-Branch Information Fusion Block links these branches, enabling mutual benefit. Furthermore, we present a structured pruning method grounded in channel attention to decrease parameter count, mitigate overfitting, and uphold segmentation accuracy. Our method surpasses other cutting-edge segmentation networks on two widely accessible datasets, achieving Dice scores of 84.09% and 93.84% on the HCMS and Duke datasets.

Citation: Yang F, Chen P, Lin S, Zhan T, Hong X, Chen Y (2025) A weak edge estimation based multi-task neural network for OCT segmentation. PLoS ONE 20(1): e0316089. https://doi.org/10.1371/journal.pone.0316089

Editor: Xiaohui Zhang, Bayer Crop Science United States: Bayer CropScience LP, UNITED STATES OF AMERICA

Received: August 26, 2024; Accepted: December 6, 2024; Published: January 3, 2025

Copyright: © 2025 Yang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting information files.

Funding: This work was supported in part by the National Social Science Foundation of China (Grant No. 22BTJ035) and by the Open Project of the Center for Applied Mathematics of Jiangsu Province (Nanjing University of Information Science and Technology). There was no additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

1 Introduction

Optical Coherence Tomography (OCT) technology can reconstruct the details of the retina and other ocular structures by obtaining high-resolution fundus images using near-infrared light and the principle of interference. The retina is primarily composed of various tissue layers, including the Nerve Fiber Layer (NFL), Inner Plexiform Layer (IPL), Inner Nuclear Layer (INL), Outer Plexiform Layer (OPL), Outer Nuclear Layer (ONL), Inner Segment (IS), Outer Segment (OS), and Retinal Pigment Epithelium (RPE). By examining the thickness of the various tissue layers, physicians can more easily assess the severity and progression of diseases such as diabetic macular edema [1], multiple sclerosis [2], and glaucoma [3]. Accurate segmentation of OCT fundus tissue is essential for calculating tissue thickness. In clinical practice, physicians often need to manually segment OCT images to determine tissue thickness. Due to common issues such as weak boundaries in OCT images, manual segmentation requires significant expertise and is time-consuming, with results that may lack reproducibility. Therefore, automated segmentation technology based on computer vision can provide physicians with stable and accurate segmentation results, offering quantitative information to aid in diagnosis.

Traditional segmentation methods have been extensively explored in the field of OCT retinal segmentation [4]. Ishikawa et al. [5] detected the peak or valley points on each A-scan using intensity gradients to locate the positions of different boundaries. They then applied curve fitting to these points to obtain continuous boundaries, which were subsequently used to generate the segmentation results. Lang et al. [6] extracted 27 features from the OCT data and input them into a random forest classifier, which generated boundary probabilities for each pixel. These probabilities were subsequently refined using a boundary refinement algorithm, leading to accurate segmentation results for exactly eight retinal layers. However, the aforementioned methods are overly reliant on carefully designed parameters, and they fail to meet clinical demands when dealing with large volumes of data [4].

With the continuous development of machine learning and image processing techniques, neural networks have emerged as prominent research directions in the domain of image classification, target detection. To solve the problems the issues mentioned above, many deep learning based segmentation methods have been proposed. Ronneberger et al. [7] proposed the U-Net network, which utilizes a structure that combines an encoder and a decoder and uses jump connections to improve segmentation performance. The encoder part resembles a common convolutional neural network, gradoublely reducing the spatial resolution of the input image and extracting high-level abstract features. The decoder part gradoublely restores the resolution and fuses the low-level features with the high-level features, thus improving the network accuracy.

Several enhanced models, such as UNet++ [8] and TransUnet [9], have been developed based on the U-Net architecture for the analysis of medical images. However, deep learning networks encounter two challenges in this domain. Firstly, most models heavily rely on cross-entropy loss, which fails to preserve weak edge information [10], as shown in Fig 1. Secondly, due to the need for expert knowledge and patient privacy protection, the availability of labeled medical image data is limited, leading to models that are susceptible to overfitting [11].

Download:

Fig 1. Predicted results of the TCCT [4] network.

(A)the initial image (B)the ground truth (C)the segmentation results.

https://doi.org/10.1371/journal.pone.0316089.g001

This paper introduces a double-branch neural network using a shared encoder, where the outputs of the two branches are respectively used for image segmentation and boundary regression. The image segmentation branch uses the U-Net network structure, while the boundary regression branch uses a truncated signed distance function as the loss function, enabling the network to learn more boundary features and preserve weak edge details. We proposes a double-branch coupling module, which can reduce the semantic gap between the features learned by the two branches, allowing better fusion of information between the branches. However, due to the increase in the number of network branches, the parameter volume of the network increases, and with the scarcity of medical images, it is inadequate to support model training, which can lead to overfitting of the model. To solve this problem, this paper proposes a model pruning method based on channel attention. By using channel attention as a measure of convolutional kernel importance, the nonlinear relationship between convolutional kernels can be learned, unimportant convolutional kernels can be removed, and the redundancy of the model can be reduced, thereby avoiding the phenomenon of model overfitting. The main contributions of this paper are as follows:

We propose a novel double-branch network that performs both image segmentation and boundary regression tasks separately. By incorporating this design, the network can effectively extract features from both the objects within the images and their boundaries, leading to improved performance in both tasks.
To address the issue of noise caused by artificial labels applied to weak boundaries and preserve the topological structure of retinal layers, we employ a truncated signed distance function as the loss function for the boundary regression task. We also examined the characteristics of TSDF and designed an adaptive weight to enhance the network’s focus on boundary and misclassified regions.
To capture non-linear interdependencies among channels and reduce the redundancy of convolutional kernels, we incorporate a channel attention layer after each convolutional layer. These layers acts as a measure of the importance of each filter, allowing for effective pruning of redundant filters. As a result, the parameter count of the network is reduced, reducing the risk of overfitting.

2 Related work

2.1 Single-task learning

With the rapid development of deep learning, convolutional neural networks (CNNs) have found widespread applications in diverse fields, including image classification [12], object detection [13], and image generation [14]. In particular, CNN-based medical image segmentation networks have made significant advancements [15].

Traditional CNN architectures, like Fully Convolutional Networks (FCNs) [16], often suffer from information loss during upsampling, impacting pixel-level segmentation accuracy [7]. To mitigate this, Ronneberger et al. [7] introduced the U-Net network, which employs an encoder-decoder structure with skip connections to fuse low-level and high-level features, although a semantic gap still exists [8]. Zhou et al. [8] later enhanced this with UNet++, a hierarchical model that captures more discriminative features at various levels. Additionally, Oktay et al. [17] proposed Attention U-Net, which incorporates attention mechanisms to focus on important regions of the image. Despite these advancements, U-Net struggles to leverage global contextual information, a limitation addressed by Chen et al. [9] with TransUnet, which integrates Transformer encoders for a more comprehensive understanding of semantic details. However, OCT fundus data presents a weak boundary issue, which the above methods fail to effectively address.

To better address the weak boundary issue in OCT fundus data, researchers have introduced a series of improved U-Net-based models. To explore the applicability of U-Net and its variants in OCT fundus image segmentation, Roy et al. [10] introduced the ReLayNet network, pioneering the use of deep learning for retinal layer segmentation. However, due to weak boundary issues in OCT fundus data, segmentation results from mainstream U-Net models and their variants often struggle to preserve the topology of retinal layers accurately. To address this issue, He et al. [18] proposed SR-Net, which comprises two cascaded deep networks. S-Net is designed to learn features from the original images for initial segmentation, while R-Net uses the segmentation labels learned by S-Net as inputs to further refine boundary positions and reconstruct the segmented image. This approach helps preserve the topological structure of the segmented image. Despite this, the learning task of R-Net is not direct boundary regression, leading to the model only implicitly learning boundary surfaces [19]. To improve the network’s ability to learn boundary features, many works have introduced boundary regression as an auxiliary task for the network to learn.

2.2 Multi-task learning

Single-task learning often fails to guide the network in learning boundary features effectively. To address this limitation, multi-task learning has become a primary approach in OCT fundus image segmentation, with the introduction of boundary regression playing a crucial role. This task enhances the model’s ability to accurately delineate retinal structures, thereby preserving fine details and maintaining topological integrity.

He et al. [19] use two projection heads on Res-U-Net to produce outputs for image segmentation and boundary regression, employing soft-argmax and smooth L1 loss for the latter. This multi-task learning improves segmentation performance by enhancing boundary detail capture; however, using the same codec for both tasks may hinder the model’s ability to fully represent both features, potentially reducing accuracy. Tan et al. [4] combine CNN and transformer techniques to leverage local and global information, proposing a boundary loss function with soft-argmax. While this enhances accuracy, a semantic gap remains between features from segmentation and boundary regression, and reliance on a single decoder restricts overall model potential. Our method uses two decoders to separately learn the tasks of image segmentation and boundary regression, thereby improving the model’s ability to extract these distinct features. Wang et al. [20] subtract deep features from shallow ones to extract edge information, introducing a Canny-based feature fusion module for boundary regression. Their method employs Kullback-Leibler scatter (KL scatter) to measure boundary column coordinates but treats neighboring coordinates as independent, ignoring horizontal dependencies. To address this, our method uses TSDF for boundary regression, which captures spatial relationships among neighboring pixels, thereby improving boundary delineation and overall segmentation performance.

Recent advancements in segmentation and boundary regression aim to help models retain detail and distinguish semantic features. However, a semantic gap between tasks persists, as most studies use a single encoder-decoder structure, limiting feature capture and hindering feature fusion [21]. Our method addresses this limitation by introducing dual decoders. Additionally, current boundary regression tasks often treat columns as independent units, overlooking their correlations, which can disrupt boundary topology. To better preserve topological features, we propose a TSDF-based boundary regression loss inspired by the level set method.

2.3 Network pruning

Complex networks frequently result in an upsurge in model parameters, leading to amplified training costs and vulnerability to overfitting. This issue is particularly pronounced in medical image data analysis, where the limited capacity of training sets makes models more susceptible to overfitting. To tackle these challenges, a number of scholars have employed pruning methods to enhance model efficiency and mitigate the risks associated with overfitting [22].

The prevailing pruning approach involves selecting and removing appropriate convolution kernels, with the crucial step being the selection of the convolution kernels to be pruned. Han et al. [23] proposed a method that utilizes L1 and L2 norms to evaluate the performance of convolution kernels, allowing for the identification and elimination of less relevant ones. The L1 and L2 paradigms offer a straightforward and intuitive method for quantifying the importance of convolutional kernels. However, these paradigms usually assess each kernel in isolation, overlooking potential interdependencies among them. In contrast, our approach leverages channel attention layers to capture the nonlinear relationships between different convolutional kernels, thereby reducing redundancy. Pruning convolutional kernels critically impacts feature extraction. While traditional methods prune and fine-tune model parameters post-training, they underperform if pre-training is inaccurate. Dinsdale et al. [11] address this limitation with Simultaneous Training and Model Pruning, selecting and pruning kernels during training in iterative cycles, allowing for precise control over the model’s architecture. We adopt this training approach in our method to achieve more effective control over the model’s architecture and enhance parameter optimization.

In recent years, many studies in the field of OCT fundus segmentation have opted for lightweight networks to extract features, aiming to avoid overfitting due to data scarcity and to meet clinical demands. For example, Tightly combined Cross-Convolution and Transformer(TCCT) [4] employs a CNN with a constant channel size of 32 and a lightweight transformer for feature extraction. RelayNet employs a CNN with a constant channel size of 64. However, this model simplification inevitably compromises the ability to extract features effectively. To effectively prune unimportant convolutional kernels while preserving model accuracy, we propose a channel attention-based model pruning strategy.

3 Proposed model

3.1 Overall

In order to enhance the accuracy of network segmentation, we propose a dual-branch architecture for image segmentation and boundary regression. In the boundary regression section, we introduce a signed distance to characterize boundary geometric features, which helps guide the network to focus on boundary feature information. This approach overcomes the limitations of weak boundaries and keep the topology of the retinal layers.

To ensure consistency in the network features and reduce the number of model parameters, we utilize the same common encoding process for both tasks. However, for decoding, we use two different processes. The overall network structure, as shown in Fig 2, follows a similar design as the Res-UNet network, consisting of four layers for encoding and decoding. Each coding layer consists of a residual block and channel attention (ResAttBlock). The specific structure is shown in the Fig 3. These modules include convolutional layers, channel attention layers, batch normalization, and Rectified Linear Unit (ReLU) activation layers. The encoding layers are connected using max pooling for downsampling.

Download:

Fig 2. The structure of the proposed network.

The proposed network is based on U-Net, consisting of a segmentation branch and a boundary regression branch. The segmentation branch extracts the overall feature information of the target, while the boundary regression branch focuses on extracting information related to the target’s boundaries.

https://doi.org/10.1371/journal.pone.0316089.g002

Download:

Fig 3. Schematic of the residual attention block.

https://doi.org/10.1371/journal.pone.0316089.g003

The high-level features obtained from the encoding process are fed into the two branches of the decoder in order to separate features. Both branches utilize a decoding structure similar to that of Res-UNet. Each branch of the network comprises four decoders, all of which are implemented as ResAttBlock. The outputs derived from each decoder in both branches are upsampled using bilinear interpolation, thus extracting features at different stages of the two paths, counted as segmentation features (S_i) and boundary features (B_i), respectively.

In order to effectively blend the feature information from both branches, we introduce a spatial attention based dual-branch information fusion block (SADBIFB), which addresses two important challenges at the same time. First, it helps reduce the semantic gap between encoding and decoding features. Second, it effectively fuses the decoding features from both the segmentation and regression branches, thus promoting a complementary relationship between the two branches. The SADBIFB couples the features obtained from the decoding layer (F_i), the features obtained from the segmentation branch (S_i), and the features obtained from the boundary regression branch (B_i).

Ultimately, the outputs of the two branches are utilized for both image segmentation and boundary regression tasks. The output of the image segmentation branch indicates the probability of each pixel belonging to a specific category, with the pixel classified into the category with the highest probability. The boundary regression output indicates each pixel’s shortest distance to the category boundaries, which is known as TSDF [24]. To enhance the network’s focus on the boundary regions, we employ the Truncated Signed Distance Function (TSDF) [24] to guide the network in learning the boundary features.

The utilization of a double branch network may lead to an expansion in the number of model parameters and inadequate calibration data for medical image analysis, leading to overfitting issues in deep networks. In order to tackle these challenges, we propose a model pruning strategy that is based on channel attention. This strategy reduces the chances of overfitting and concurrently decreases the number of parameters in the model.

3.2 Spatial attention-based dual-branch information fusion block

In this section, we present SADBIFB as a solution to bridge the semantic gap between the two branches and effectively combine the feature information they provide. The structure of the SADBIFB is illustrated in Fig 4.

Download:

Fig 4. Schematic of SADBIFB.

https://doi.org/10.1371/journal.pone.0316089.g004

The inputs of SADBIFB consists of feature maps from the decoding layer (F_i), the features obtained from the segmentation branch (S_i), and the features obtained from the boundary regression branch (B_i). These feature maps are of shape (c × h × w), where c represents the number of channels, h represents the image height, and w signifies the image width. To enhance the network’s spatial information ability, this study incorporates a spatial attention gate [17], which selectively focuses on informative spatial regions during the information fusion process.

In the segmentation branch, we utilize two spatial attention gates to extract effective encoded features and boundary regression features. The same approach is taken in the boundary regression branch as well. For instance, in the segmentation branch, the boundary regression feature space attention gate (shown as the red area in Fig 4 is used. This gate applies spatial attention to the features in S_I and B to extract the spatial importance of feature information. This spatial importance is then applied to B_I to extract important boundary regression features in the segmentation branch, which provides more effective boundary regression feature information. The specific structure of this process is depicted in Fig 5. The attention gate function G(S_i, B_i) of the example is defined as follows: (1)

Download:

Fig 5. Schematic of the spatial attention gate [17].

Taking the segment branch as an example, this gate extracts pertinent information from the boundary regression branch B_i which is crucial for segmentation branch.

https://doi.org/10.1371/journal.pone.0316089.g005

Here, and are the parameters that need to be learned. Here, c represents the number of channels in the feature maps.

The function takes two inputs, S_i and B_i, and computes the attention gate output. The attention gate function involves two activation functions: δ and σ. The activation function δ uses ReLU, which ensures that only positive values are passed through. The activation function σ uses the sigmoid function, which squashes input values between 0 and 1. The element-wise multiplication operator (⋅) is used to perform a pointwise multiplication between the weight vector and the feature map along the channel dimension. This helps in weighting the importance of boundary regression features and improve the ability of the segmentation branch. The other three spatial attention gates follow the same structure as this example.

Using the segmentation branch as an example, we employ a spatial attention mechanism to extract features from encoding features and boundary regression features. These features are then concatenated with segmentation features in the channel dimension. The fused features are further processed through convolutional layers.

3.3 Truncated signed distance function for boundary regression

In the given method, we are calculating the distance between each pixel and the nearest boundary pixel for each category Ω_k. In order to make the network more focused on the features of the boundary, we use the truncated signed distance function(TSDF) as the objective function for boundary regression branch learning. The TSDF is defined as follows: (2)

Here, x represents a pixel in the image, τ represents truncation distance,and ∂Ω_k is the set of pixels that form the boundary of the kth target. The function d(x, ω) calculates the Euclidean distance between the pixel x and the pixel ω. The TSDF assigns negative values to pixels inside the target Ω_k and positive values to pixels outside Ω_k. The negative value represents the distance to the boundary pixel inside the target, whereas the positive value represents the distance to the boundary pixel outside the target. Next, for training stability, we normalize the distance d(x, ∂Ω_k) using the following formula: (3)

Here, represents the normalized distance, which we refer to as d^k for convenience. This normalization scales the truncated distance values between the range of [-1, 1]. We observed that the difference between the maximum and the second-largest values of the TSDF, computed by the network for pixel points near the boundary and in misclassified regions, is small. The former is due to the inherent characteristics of the true TSDF corresponding to the pixel points themselves, while the latter indicates that the network has not correctly learned the topology of the image. To encourage the network to focus more on boundaries and misclassified regions, we propose an adaptive weighting scheme based on information entropy. This scheme leverages the maximum and second-largest values of the TSDF, as described by the following formula: (4) (5) where k_{max 1} is the index that maximizes d^k(x): (6)

For each pixel point d_{max 2}(x), d_{max 2}(x) the first two maxima are converted to probability values using the softmax function: (7) (8)

Based on these probability values, the information entropy w(x) is calculated: (9)

Fig 6 illustrates the boundary labels employed in [4, 19], which are presented in the second row. The third row displays the SDF labels that we utilize, along with the adaptive weights applied. Due to the unique topology of the retinal layer, He et al. [19] employed one-hot encoding for boundary labels and regressed the boundary coordinates of each column. They utilized KL divergence loss to enable the network to learn multiple independent distributions of the column coordinates. In contrast, Tan et al. [4] highlighted that there is inherent uncertainty in the labels. They employed soft-argmax to enable the model to learn a smooth distribution with higher probability near the boundaries. However, they noted that the boundary coordinate distributions of different columns are not independent, and there is a significant degree of dependence between adjacent columns. SDF describes the distance from each point in the image to the target boundary. Distances inside the target are positive, while those outside are negative, with the zero level set representing the target boundary (marked in black in the figure). This approach makes it easier for the network to capture boundary features. Additionally, SDF better preserves slender topologies, leading to improved segmentation in thinner retinal layers. It is worth noting that our method does not generate spurious boundaries in complex boundary areas.

Download:

Fig 6. Visualized results of different boundary labels.

The first row presents the OCT fundus image and its corresponding pixel-level label. The second row shows the boundary label and the label smoothed with a Gaussian kernel. The third row features the TSDF-based label (The boundary (zero TSDF) is represented as black) and the information entropy weights.

https://doi.org/10.1371/journal.pone.0316089.g006

The loss function in our model is composed of the segmentation loss (L_seg) and the boundary regression loss (L_bou), and it is calculated as follows: (10)

Here, L_seg represents the segmentation loss, which is computed using the cross-entropy loss and Dice loss for K classes. The equation for L_seg is given as: (11) (12) (13)

In the equation above, N represents the number of pixels in the image, is the ground truth, and is the output of the segmentation branch. The boundary regression loss L_bou is computed as the mean square error (MSE) between the output of the boundary regression branch and the truncated signed distance function. The equation for L_bou is given as: (14)

Here, is the output of the boundary regression branch, and represents the truncated signed distance function of the original label.

3.4 Model pruning with channel attention

The number of parameters in our proposed two-branch network has increased by approximately 5.68 million parameter compared to the original U-Net network. Furthermore, limited medical image training set can lead to overfitting. To address this issue, we propose a channel attention-based pruning scheme. According to the lottery ticket hypothesis [22], a dense and large network contains a smaller subnetwork that can achieve equal or higher accuracy through training. By removing redundant parameters, we can significantly reduce the total parameter count in the model without compromising its accuracy. This approach helps mitigate the problem of overfitting.

Deep networks frequently employ a large number of convolutional kernels in their network architecture to improve their capacity for capturing image features. However, this strategy often results in parameter redundancy. In order to tackle the limitation of previous research, Behrad et al. [25] proposed a method to reduce parameter redundancy by utilizing the L1 and L2 norms to determine the importance of each feature channel. By removing the convolutional kernels corresponding to low importance channels, the researchers successfully reduced the number of model parameters. However, this approach only considers the importance of convolutional kernels based on their corresponding channels, neglecting the nonlinear relationship between these kernels. As a result, it fails to effectively eliminate redundant information between channels [26]. To address this limitation, we introduce the channel attention mechanism in this study. By incorporating the channel attention mechanism, the model can autonomously learn and select the most valuable feature channels. This allows for the capture of nonlinear relationships between these channels, specifically the nonlinear connections between convolutional kernels.

As depicted in Fig 7, the CA model adds a layer of channel attention after each convolutional layer. Let denote the feature map of the input of the ith CA model followed by the i + 1th CA model in each branch. is the convolutional kernel, and is the corresponding output. To mitigate the influence of spatial information on channel attention, global max pooling and global average pooling are initially applied to compress the feature map from C × W × H to C × 1 × 1, as illustrated in the following formula: (15) Here, z_i_c represents the information in the cth channel of the feature map z_i, and X_c(m, n) represents the value at position (m, n) in the cth channel of the feature map. Then, non-linear interactions between channels are learned through the next two fully connected layers, as shown in the following formula: (16) In this formula, , and r represents the compression ratio (typically set to 4). represents the weights assigned to each channel. Finally, the weights α_i and features are transformed into points based on their spatial dimensions to acquire .

Download:

Fig 7. Schematic of the model pruning.

The pink channels in the figure represent the channels that need to be pruned.

https://doi.org/10.1371/journal.pone.0316089.g007

Before evaluating the importance of convolutional kernels, it is necessary to pre-train the model for 100 epochs to allow the model to initially learn from the data. The importance of each convolutional kernel is determined based on the attention α_i of its corresponding channel. As the channel attention α_i is data-driven, it is not fixed after training the model; rather, it varies with changes in the input data. To overcome this, the dataset is divided into batches of equal size. Each batch is then fed into the network to obtain the α_i values for each convolutional layer in that batch. The final importance score for each convolutional kernel is calculated by summing up the α_i values obtained from all the batches. To obtain a subnetwork, the convolutional kernel importance metrics are sorted in ascending order, and the top r% kernels are selected and removed. Zhu et al. [27] found that the features of the shallow channels are more diverse than those of deep channels, based on the feature similarity matrix. To address this, we group and rank the convolutional kernels based on their depth, then prune the bottom r% of the least important kernels in each group. After fine-tuning, we prune the remaining r% of the kernels, repeating this process five times. The model with the best performance is then selected for retention.

Fig 7 illustrates the process of pruning a convolutional kernel. Once identified, we proceed to crop the specific convolutional kernel along with its associated feature channels. Additionally, we adjust the number of channels for each convolutional kernel in the next CA model to maintain the consistency of the network architecture.

However, it is important to note that the subnetwork obtained from the previous step is highly sensitive to initialization. To preserve the image features learned by the original network as effectively as possible, the training parameters of the original network are assigned to the unpruned parameters in the subnetwork as initialization. This ensures that the learned knowledge is retained. The subnetwork is then fine-tuned for an additional 20 epochs to further optimize its performance, resulting in the final model.

4 Experiments setup

4.1 Datasets

The data utilized in this study is sourced from two datasets, namely HCMS(https://iacl.ece.jhu.edu/index.php?title=Resources) and DUKE(https://people.duke.edu/~sf59/Chiu_BOE_2014_dataset.htm) dataset.

The HCMS dataset provides fundus OCT images of 21 multiple sclerosis patients and 14 normal individuals, with each dataset containing 49 B-scans. In total, the dataset comprises 1,715 images. The size of each image element is 496 × 1024. In order to reduce the computational effort, the images were subjected to a flatting operation using the same method as in [6], and then we cropped the redundant Choroid and Vitreous Body and resize to 256 × 512. We selected the last six HC volumes and the last nine MS volumes for training the model, while the remaining 20 volumes were reserved for testing.

The DUKE dataset includes 10 diabetic macular edema patients, each with 11 B-scans, totaling 100 images. Each image has a pixel size of 496 × 768. We used the first five patients for training and the last five patients for testing. Notably, we did not apply a flattening operation to this dataset. Instead, we directly cropped the excess choroid and vitreous and resized the images to 256 × 512.

The HCMS dataset includes retinal images from different patients, containing both healthy retinal images and images with pathological regions. This diversity allows the model to be exposed to a broader range of image types during training, enabling it to learn how to segment various retinal regions. On the other hand, the DUKE dataset specifically focuses on diabetic macular edema (DME) lesions, which is crucial for training the model’s ability to segment specific pathological areas. Conducting experiments on these two datasets ensures that the model can maintain high segmentation accuracy when faced with different types of images and various pathological conditions, thereby validating its generalization ability.

4.2 Training and evaluation metrics

To quantitatively evaluate our model and compare it with U-Net [7], Attention U-Net [17], UNet++ [8], Res-UNet [28], AttUnet [17], TransUnet [9], FCRN [19] and RelayNet [10], we employed the Dice, IOU as evaluation criteria. These two evaluation criteria is commonly utilized to quantify the similarity between model predictions and corresponding labels. It is computed by taking the intersection of the predicted and labeled regions and dividing it by the union of these two regions. We calculated the dice scores for each category and calculated the average dice score and iou score for all categories.

4.3 Training details

We perform five-fold cross-validation on the training set, using 80% of the data for training and 20% for validation in each fold. The model selected for the final evaluation is the one that achieves the highest mean Dice score (mdice) on the validation set. To manage computational costs, we resize both datasets to 256 × 512 pixels. During training, we apply random cropping to 256 × 256 pixels. For validation and testing, we retain the 256 × 512 pixel size. Our data augmentation strategy includes random horizontal and vertical flips, as well as random brightness and contrast adjustments, each with a probability of 0.5. The Adam optimizer was employed as the model’s optimizer, and the Cosine Annealing Learning Rate technique was utilized to adjust the learning rate. The learning rate started at 5e-4 and was reduced to 5e-6. The weight decay is set to 1e-5. During the training process, the batch size was set to 8, while during the testing process, it was set to 1. All experiments were trained for 100 epochs.

5 Experiments results and discussion

5.1 Segmentation results

To demonstrate the superiority of our model, we compared it with other popular image segmentation models. These models can be divided into three groups: traditional model, single task models and double task models. The traditional model include AUtomated Retinal Analysis(AURA) [6]. The single task models include U-Net [7], Attention U-Net [17], UNet++ [8], Res-UNet [28], and TransUnet [9]. The multi-task networks specifically designed for retinal layer segmentation include ReLayNet [10] and FCRN [19].

The segmentation results are shown in Table 1 for the HCMS dataset and in Table 2 for the Duke dataset. We set the number of channels for the single-task network to [32, 64, 128, 258, 512], following the conventional setting. For multitasking, we adhere to the original paper’s settings, which utilize a lightweight network with all channel numbers set to 64. Our network is configured to [32, 64, 128, 128, 128]. To compare the effect of parameter count on model accuracy, we also set the number of channels for the FCRN network to [32, 64, 128, 258, 512] for training.

Download:

Table 1. Dice scores for each tissue type in the HCMS dataset, as well as the average Dice score and average IoU for each tissue type.

https://doi.org/10.1371/journal.pone.0316089.t001

Download:

Table 2. Dice scores for each tissue type in the DUKE dataset, as well as the average Dice score and average IoU for each tissue type.

https://doi.org/10.1371/journal.pone.0316089.t002

The U-Net network [17] is currently the dominant method in the field of medical segmentation, and most of the current segmentation networks are based on variations of U-Net networks. To address the issue of U-Net overfitting, the Res-UNet network [28] incorporates the Res module to capture feature residouble information. UNet++ [8] utilizes multi-layer networks to bridge the semantic gaps between codecs, which cannot be overcome by the original U-Net skip connections, resulting in improved accuracy. By employing skip connections based on a spatial attention mechanism, Attention U-Net [17] reduces semantic redundancy between encoding and decoding, further enhancing segmentation accuracy. TransUnet leverages the Transform mechanism to extract the final result of the encoding layer and incorporate spatial information, leading to improved segmentation accuracy. However, this method only extracts features from the final encoding result and introduces additional parameters.

To enhance the network’s ability to extract boundary features, RelayNet [10] is designed with an adaptive weighting mechanism that takes into account both the image gradient magnitude and the number of categories. This approach enhances the model’s sensitivity to boundaries and increases the emphasis on foreground regions, leading to more accurate segmentation results. FCRN [19] employs a double head project to obtain boundary feature information while performing segmentation, but single branch limits the accuracy improvement.

Our proposed method utilizes a coupling mechanism that integrates segmentation and boundary regression, enabling us to restore boundary information while performing segmentation. The synergy between the two components results in optimal outcomes.

From Table 1, we can observe that mainstream segmentation networks have achieved notable results. Among them, the Res-UNet network stands out by expanding the receptive field through residual connections with convolutions using different dilation rates. This approach effectively mitigates the gradient vanishing and explosion issues caused by having too many channels, thereby improving model accuracy, making Res-UNet the most effective among mainstream segmentation networks.

However, it is noteworthy that the number of parameters in TransUnet far exceeds that of other networks, yet its IoU only reaches 83.34%, making it less favorable for clinical use. In networks specifically designed for fundus segmentation, we set the number of channels to 64 following the original paper’s settings. Consequently, the parameter counts for RelayNet (tiny) and FCRN (tiny) were 1.26M and 1.38M, respectively. However, the limited number of parameters restricted the model’s capability.

To verify our hypothesis, we modified the number of channels in FCRN to [32, 64, 128, 256, 512], resulting in a total of 13.9M parameters. Experimental results showed that the IoU improved by 0.45% compared to the tiny version of FCRN. Our method achieved the best results across all evaluated datasets, with the number of parameters being only 2.07M.

The Duke dataset, characterized by edema and severe deformation, coupled with its smaller size and the presence of some missing annotations, led to significantly lower accuracy for all models compared to the HCMS dataset. It is also important to note that we did not flatten the DUKE dataset, which resulted in a large amount of background area in the images, which may be one of the reasons for the mediocre results of both the methods we compared and our own. However, all experiments were performed in a uniform setup. From the Table 2, it can be seen that UNet++ demonstrates higher accuracy for this complex deformation data, primarily due to its ability to fuse multi-scale features. Additionally, the boundary regression task of the FCRN network somewhat enhances the accuracy of segmentation tasks on slender topologies. Our network further improves learning capability on complex deformation data through its dual-branch architecture and weighted TSDF approach. Our method achieved a 3.57% improvement in IoU compared to U-Net.

To demonstrate the effectiveness of our method, we provide specific segmentation results on two datasets, as depicted in Figs 8 and 9. The figure showcases various image samples and their corresponding segmentation outputs. The first column represents the original images, while the subsequent columns depict the ground truth and the results obtained from U-Net [7], UNet++ [8], Res-UNet [28], Attention U-Net [17], TransUnet [9], ReLayNet [10], FCRN [19] and our proposed method.

Download:

Fig 8. Segmentation results of HCMS dataset.

From left to right, the columns display the initial image, ground truth, and the segmentation results of AURA, U-Net, UNet++, Res-UNet, Attention U-Net, ReLayNet, FCRN, and our method, respectively.

https://doi.org/10.1371/journal.pone.0316089.g008

Download:

Fig 9. Segmentation results of DUKE dataset.

From left to right, the columns display the initial image, ground truth, and the segmentation results of U-Net, UNet++, Res-UNet, Attention U-Net, FCRN, and our method, respectively.

https://doi.org/10.1371/journal.pone.0316089.g009

The retinal layer images exhibit numerous depressions and elevated areas, while the OCT images are affected by noise and weak border issues. Fig 8 shows some of the hard-to-split cases for HCMS data. First, the top row displays a depressed region in the retinal layer where the widths of the IPL, INL, and OPL are extremely narrow, leading to disrupted segmentation structures in these areas. The results indicate that AURA effectively preserves the central INL structure, but there may be deviations in the boundary positioning of the concave area, thereby reducing the segmentation accuracy. In contrast, U-Net, Attention U-Net, and RelayNet often fail to capture the central INL. On the other hand, UNet++, Res-UNet, and FCNR can identify these elongated structures, but none of them manage to preserve the original topology effectively. Our method, however, maintains these elongated topologies well. Second, the second line illustrates an uneven, irregular retinal layer with low contrast issues. Traditional methods, when faced with low contrast, struggle to accurately locate boundary points, further resulting in jagged boundaries and a decrease in segmentation accuracy. Mainstream segmentation algorithms struggle to preserve the positional relationships between different retinal layers accurately. Although RelayNet and FCRN, which are designed specifically for retinal segmentation, manage to maintain the approximate hierarchical relationships, there are still mis-segmented pixels. In contrast, our method preserves these details more effectively. Third, the third line illustrates the impact of highlighted regions on the segmentation structure. Traditional methods are affected by high-intensity regions, leading to abnormal protrusions and depressions at the boundaries. The pixel gradients produced by these highlighted areas can lead the network to learn incorrect features. As shown in the figure, the highlighted region disrupts the network’s learning structure for the OPL and ONL layers, whereas only our network successfully maintains the correct structure.

The complex structure of the DUKE dataset we show in Fig 9. In the first line, we highlight a region with weak boundaries. Most mainstream networks struggle to accurately recognize these slender layers, while our method effectively segments the weakly bounded areas. This performance is consistent with the results observed on the HCMS dataset. In the second row, we present images with significant skewing, a result of not flattening the DUKE dataset. The figure reveals a downward prominence in the OPL, which most methods, including ours, struggle to accurately capture. However, the latter three methods, specifically designed for fundus segmentation, perform better than single-task methods. In the depressed region on the right, only our method successfully maintains the hierarchical structure.

5.2 Ablation experiments

In our proposed method, we introduce a double task network that aims to achieve target segmentation and boundary regression simultaneously. To effectively leverage the segmentation features and boundary features, we present a spatial attention mechanism that combines feature information from both tasks, enhancing the network’s ability to extract target and boundary features. In the boundary regression branch, we utilize the original labeled TSDF function as the task, which serves as a guide for the network to learn boundary features more effectively. To ensure the network focuses more on the spatial relationships near the boundaries, we set the TSDF threshold to 5. Additionally, adaptive weights are designed to enhance attention to both boundary regions and areas with misclassification. To address the issue of limited training samples in medical imaging, we also propose a channel attention mechanism to prune the model, reducing the risk of network overfitting. This allows for more robust and reliable predictions. In the following sections, we will conduct ablation experiments on each of these improvements to demonstrate their effectiveness.

5.2.1 Ablation experiment for double task.

In this Section, we conducted ablation experiments on the improved parts. Firstly, we analyzed the effect of double task networks on model accuracy. Table 3 presents the experimental results of our method compared to Res-UNet, Res-UNet with double head project(DH U-Net), double branch Res-UNet (DB U-Net), double branch Res-UNet with feature concatenation (DB U-Net+Concat) and double branch Res-UNet with SADBIFB(DB U-Net+SADBIFB). It can be observed that the use of double tasks networks in DH U-Net allows the network to capture more boundary features, leading to an improvement in the segmentation accuracy. However, this method does not involve information coupling within the network, which limits the improvement in segmentation accuracy.

Download:

Table 3. Evaluation of the effectiveness of double task.

https://doi.org/10.1371/journal.pone.0316089.t003

On the other hand, the DH U-Net+Concat method directly concatenates the features of the segmentation branch and the boundary regression branch, thereby achieving more accurate results and increasing the accuracy.

In this article, we propose the adoption of a feature fusion module based on Spatial Attention to effectively utilize both segmentation features and boundary regression features. By promoting the complementarity of these features, we are able to achieve optimal results with an increased accuracy.

5.2.2 Ablation experiment for boundary regression loss.

This section aims to provide a detailed analysis of the objectives of boundary regression. FCRN [19] regresses the boundary coordinates of each column using the cross-entropy loss function. While using only boundaries as regression targets can enhance the network’s ability to extract boundary features, it can also result in significant loss when the predicted results differ from the actual results by just one pixel. This heavy reliance on label results can be problematic for medical imaging, where labels at weak boundaries often contain uncertainty and noise. As a result, the network may overfit to this label noise.

To address this issue, Tan et al. [4] propose a regression objective based on soft-argmax on the boundary. This approach effectively reduces the interference of label noise and yields better results. However, due to the isotropy of the convolution kernel, it tends to generate pseudo boundaries at complex boundaries. To overcome this challenge, we introduce a regression objective based on the truncated signed distance function, which effectively minimizes the generation of pseudo boundaries and achieves more accurate results. In our experiments, we observed that the nine TSDF values produced by the boundary regression branch for the same pixel tend to be very similar, particularly near the boundary and in regions where the network has not accurately learned the topology. To address this issue, we designed the adaptive weights described in the main text, which are visualized as shown in Fig 10.

Download:

Fig 10. The hotmap of adaptive weight.

The figure displays, in order: the original image, labels, adaptive weights, results obtained using TSDF loss, and results obtained using weighted TSDF loss.

https://doi.org/10.1371/journal.pone.0316089.g010

The heat maps corresponding to the adaptive weights, the results obtained using the TSDF loss function, and the results obtained using the weighted TSDF loss function are shown in Fig 10. From the Fig 10, we can observe that the unweighted TSDF loss function fails to capture the correct topology in some weak boundaries and diseased regions. Consequently, the adaptive weights we designed are increased in these areas. Additionally, there is a high response near the boundaries. The weighted TSDF loss function emphasizes these regions, allowing the network to better learn the complex topology.

We performed ablation experiments on several of these loss functions. For the fairness of the experiments, we uniformly use our proposed model for the experiments. Table 4 showcases the final segmentation results obtained using several boundary regression targets. It is evident that our method has improved the IoU by 1.06%.

Download:

Table 4. Evaluation of the effectiveness of boundary regression target.

https://doi.org/10.1371/journal.pone.0316089.t004

5.2.3 Ablation experiment for pruning.

In medical image analysis, the scarcity of calibration data often results in network overfitting and increased computational costs, posing challenges for clinical deployment. Furthermore, the utilization of a dual task model necessitates a larger number of parameters. To address this, RelayNet and FCRN have adopted lightweight configurations, though this approach can reduce the model’s learning capacity. Therefore, network pruning becomes crucial to reduce the number of model parameters and mitigate the risk of overfitting. In this section, we will delve into the performance disparities of networks pre and post pruning, along with comparing the diverse effects resulting from various pruning strategies. Table 5 demonstrates the model accuracy achieved under different pruning strategies. Additionally, we compared the U-Net network with the lightweight networks FCRN, as shown in the table.

Download:

Table 5. Evaluation of the effectiveness of pruning.

https://doi.org/10.1371/journal.pone.0316089.t005

To demonstrate the superiority of the channel attention-based pruning strategy proposed in this article, we conducted a comprehensive comparison with pruning strategies based on L1 and L2 criteria. Our aim was to showcase how our pruning strategy outperforms the others and achieves higher segmentation accuracy. After pruning to achieve a lower parameter count, we observed noteworthy results. Our pruning strategy not only achieved comparable pruning levels but also demonstrated higher segmentation accuracy compared to the other two strategies (L1 and L2). This highlights the effectiveness and efficiency of our proposed channel attention-based pruning strategy.

After model pruning, the segmentation accuracies of various tissues exhibit differential changes, reflecting the tissue-dependent effects of the pruning strategy on model performance. The results show significant improvements in the accuracy of ONL (+0.17%), IS (+0.12%), OS (+0.46%), and RPE (+0.60%), particularly for OS and RPE. This indicates that the pruning strategy effectively reduces redundant parameters and enhances the model’s ability to focus on key feature extraction for these specific tissues. At the same time, NFL (+0.01%) and OPL (+0.03%) demonstrate modest gains, reflecting a degree of robustness and stability, as their segmentation performance remains largely unaffected by pruning.

However, slight declines in the accuracy of IPL (-0.04%) and INL (-0.08%) suggest that pruning may have compromised the model’s ability to capture the finer details of these tissues. These regions likely require richer feature representations, and the reduction in parameters may have led to insufficient feature extraction, impacting segmentation performance.

Overall, the pruning strategy enhances the segmentation of critical tissues while maintaining the model’s overall performance, with notable improvements in key areas such as OS and RPE. This demonstrates that careful parameter pruning can improve segmentation accuracy for specific important tissues while reducing model complexity. However, further optimization of the pruning strategy is necessary to account for the varying sensitivities of different tissues, ensuring a comprehensive improvement in segmentation performance across all tissues.

In our experiments, the choice of pruning ratio r% plays a crucial role in model performance. The pruning ratio r% determines the percentage of channels removed during the training process, and selecting an appropriate value is essential for balancing the model’s computational efficiency and segmentation accuracy. To ensure the effectiveness of our pruning strategy, we employ a depth-based convolutional kernel grouping approach: the kernels are grouped by depth, and in each group, the bottom r% of the least important kernels are pruned. After each pruning step, the remaining model is fine-tuned, and this process is repeated, pruning r% of the kernels in each group for a total of five iterations. The model with the best performance is selected after all rounds of pruning and fine-tuning.

As shown in the experimental results (Table 6), when the pruning ratio is small (e.g., 5%), the model retains a high number of parameters (5.84M) and performs similarly to the baseline across all layers. As the pruning ratio r% increases, the model’s parameter count decreases gradually, without a significant drop in performance. For example, when the pruning ratio reaches 20%, the model achieves the best balance, with a Dice score of 91.15% and an IoU of 84.06%, indicating that moderate pruning effectively reduces the number of parameters while maintaining high segmentation accuracy. However, when the pruning ratio is increased further (e.g., 30%), we begin to observe a slight decrease in performance, likely due to the removal of important features in the deeper layers. This suggests that while pruning helps to improve computational efficiency, excessive pruning can lead to the loss of essential information, particularly in deeper feature channels. Based on our experimental results, we conclude that a pruning ratio r% between 20% and 25% offers the optimal balance between reducing the number of parameters and maintaining segmentation accuracy. This range allows for significant parameter reduction while preserving key structural and boundary information.

Download:

Table 6. Evaluation of the effectiveness of the pruning ratio (r%).

https://doi.org/10.1371/journal.pone.0316089.t006

In conclusion, the results of our experiments clearly indicate that our pruning strategy excels in terms of segmentation accuracy. By considering channel attention, we effectively remove unnecessary channels while maintaining the overall integrity of the segmentation process. We can observe that our parameters are close to FCRN, while IoU has increased by 0.34%.

5.2.4 The summary of ablation experiment.

In order to provide a clearer understanding of the improvement in network accuracy achieved through multiple enhancements discussed in this article, we conducted a comprehensive ablation experiment. The results of this experiment are summarized in Table 7.

Download:

Table 7. The summary of ablation experiment.

https://doi.org/10.1371/journal.pone.0316089.t007

From the table, it is evident that our dual task network incorporates boundary regression tasks based on TSDF, enhancing the network’s ability to acquire boundary features. Additionally, we employ a coupling module that utilizes spatial attention to facilitate feature fusion between the two tasks, enabling target segmentation and boundary regression to complement each other. Furthermore, our pruning strategy, which is based on channel attention, not only reduces the number of model parameters but also mitigates the risk of network overfitting.

The experimental results demonstrate that our method improves the average segmentation accuracy in Dice by 0.29% and IoU by 0.47%, compared to the Res-UNet network.

6 Conclusion and discussion

This article presents a new dual branch network designed to improve the accuracy of Retinal OCT Segmentation. The network achieves target segmentation and boundary regression simultaneously, utilizing a truncated signed distance function to reduce the impact of weak boundaries. Meanwhile, an adaptive weight was designed based on TSDF. Additionally, it introduces a spatial attention coupling module to integrate segmentation and boundary regression features, enhancing their complementarity. Furthermore, the article proposes a pruning module based on attention mechanism to reduce model parameters and the risk of overfitting.

We believe that this article primarily focuses on learning TSDF as a network learning task. In the future, we aim to explore how to better integrate level set algorithms with deep learning, leveraging the bridge established in this work. Additionally, Due to a lack of labeled medical image data, one of our future research focuses is on developing a semi-supervised model to leverage unlabeled data for improved network feature extraction of targets.

Our proposed model enables more accurate OCT segmentation results, providing quantitative support for subsequent disease diagnosis (e.g., tissue thickness measurements). However, in actual clinical practice, certain pathologies such as macular edema, glaucoma, and optic nerve diseases can lead to abnormal tissue thickness. Significant thickness abnormalities can be easily detected based on the segmentation results. However, subtle thickness changes are also associated with certain diseases. In future research, we will further explore the relationship between minor thickness variations and disease, providing more precise diagnostic references for physicians to comprehensively assess the condition and develop more effective treatment plans.

Supporting information

S1 File.

https://doi.org/10.1371/journal.pone.0316089.s001

(DOCX)

S2 File.

https://doi.org/10.1371/journal.pone.0316089.s002

(PDF)

Acknowledgments

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions to improve the quality of the paper.

References

1. Tan TE, Wong TY. Diabetic retinopathy: Looking forward to 2030. Frontiers in Endocrinology. 2023;13:1077669. pmid:36699020
- View Article
- PubMed/NCBI
- Google Scholar
2. Cennamo G, Romano M, Vecchio E, Minervino C, della Guardia C, Velotti N, et al. Anatomical and functional retinal changes in multiple sclerosis. Eye. 2016;30(3):456–462. pmid:26681148
- View Article
- PubMed/NCBI
- Google Scholar
3. Bussel II, Wollstein G, Schuman JS. OCT for glaucoma diagnosis, screening and detection of glaucoma progression. British Journal of Ophthalmology. 2014;98(Suppl 2):ii15–ii19. pmid:24357497
- View Article
- PubMed/NCBI
- Google Scholar
4. Tan Y, Shen WD, Wu MY, Liu GN, Zhao SX, Chen Y, et al. Retinal layer segmentation in OCT images with boundary regression and feature polarization. IEEE Transactions on Medical Imaging. 2023;.
- View Article
- Google Scholar
5. Ishikawa H, Stein DM, Wollstein G, Beaton S, Fujimoto JG, Schuman JS. Macular segmentation with optical coherence tomography. Investigative ophthalmology & visual science. 2005;46(6):2012–2017.
- View Article
- Google Scholar
6. Lang A, Carass A, Hauser M, Sotirchos ES, Calabresi PA, Ying HS, et al. Retinal layer segmentation of macular OCT images using boundary classification. Biomedical optics express. 2013;4(7):1133–1152. pmid:23847738
- View Article
- PubMed/NCBI
- Google Scholar
7. Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer; 2015. p. 234–241.
8. Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J. Unet++: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE transactions on medical imaging. 2019;39(6):1856–1867. pmid:31841402
- View Article
- PubMed/NCBI
- Google Scholar
9. Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y, et al. Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:210204306. 2021;.
10. Roy AG, Conjeti S, Karri SPK, Sheet D, Katouzian A, Wachinger C, et al. ReLayNet: retinal layer and fluid segmentation of macular optical coherence tomography using fully convolutional networks. Biomedical optics express. 2017;8(8):3627–3642. pmid:28856040
- View Article
- PubMed/NCBI
- Google Scholar
11. Dinsdale NK, Jenkinson M, Namburete AI. STAMP: Simultaneous Training and Model Pruning for low data regimes in medical image segmentation. Medical Image Analysis. 2022;81:102583. pmid:36037556
- View Article
- PubMed/NCBI
- Google Scholar
12. Obaid KB, Zeebaree S, Ahmed OM, et al. Deep learning models based on image classification: a review. International Journal of Science and Business. 2020;4(11):75–81.
- View Article
- Google Scholar
13. Zhao ZQ, Zheng P, Xu St, Wu X. Object detection with deep learning: A review. IEEE transactions on neural networks and learning systems. 2019;30(11):3212–3232. pmid:30703038
- View Article
- PubMed/NCBI
- Google Scholar
14. Gregor K, Danihelka I, Graves A, Rezende D, Wierstra D. Draw: A recurrent neural network for image generation. In: International conference on machine learning. PMLR; 2015. p. 1462–1471.
15. Minaee S, Boykov Y, Porikli F, Plaza A, Kehtarnavaz N, Terzopoulos D. Image segmentation using deep learning: A survey. IEEE transactions on pattern analysis and machine intelligence. 2021;44(7):3523–3542.
- View Article
- Google Scholar
16. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2015. p. 3431–3440.
17. Oktay O, Schlemper J, Folgoc LL, Lee M, Heinrich M, Misawa K, et al. Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:180403999. 2018;.
18. He Y, Carass A, Liu Y, Jedynak BM, Solomon SD, Saidha S, et al. Deep learning based topology guaranteed surface and MME segmentation of multiple sclerosis subjects from retinal OCT. Biomedical optics express. 2019;10(10):5042–5058. pmid:31646029
- View Article
- PubMed/NCBI
- Google Scholar
19. He Y, Carass A, Liu Y, Jedynak BM, Solomon SD, Saidha S, et al. Structured layer surface segmentation for retina OCT using fully convolutional regression networks. Medical image analysis. 2021;68:101856. pmid:33260113
- View Article
- PubMed/NCBI
- Google Scholar
20. Wang B, Wei W, Qiu S, Wang S, Li D, He H. Boundary aware U-Net for retinal layers segmentation in optical coherence tomography images. IEEE Journal of Biomedical and Health Informatics. 2021;25(8):3029–3040. pmid:33729959
- View Article
- PubMed/NCBI
- Google Scholar
21. He K, Lian C, Zhang B, Zhang X, Cao X, Nie D, et al. HF-UNet: learning hierarchically inter-task relevance in multi-task U-net for accurate prostate segmentation in CT images. IEEE transactions on medical imaging. 2021;40(8):2118–2128. pmid:33848243
- View Article
- PubMed/NCBI
- Google Scholar
22. Frankle J, Carbin M. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:180303635. 2018;.
23. Han S, Pool J, Tran J, Dally W. Learning both weights and connections for efficient neural network. Advances in neural information processing systems. 2015;28.
- View Article
- Google Scholar
24. Park JJ, Florence P, Straub J, Newcombe R, Lovegrove S. Deepsdf: Learning continuous signed distance functions for shape representation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2019. p. 165–174.
25. Behrad F, Abadeh MS. Evolutionary convolutional neural network for efficient brain tumor segmentation and overall survival prediction. Expert Systems with Applications. 2023;213:118996.
- View Article
- Google Scholar
26. Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2018. p. 7132–7141.
27. Zhu W, Chen X, Qiu P, Farazi M, Sotiras A, Razi A, et al. SelfReg-UNet: Self-Regularized UNet for Medical Image Segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2024. p. 601–611.
28. Diakogiannis FI, Waldner F, Caccetta P, Wu C. ResUNet-a: A deep learning framework for semantic segmentation of remotely sensed data. ISPRS Journal of Photogrammetry and Remote Sensing. 2020;162:94–114.
- View Article
- Google Scholar

[ref1] 1. Tan TE, Wong TY. Diabetic retinopathy: Looking forward to 2030. Frontiers in Endocrinology. 2023;13:1077669. pmid:36699020
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Cennamo G, Romano M, Vecchio E, Minervino C, della Guardia C, Velotti N, et al. Anatomical and functional retinal changes in multiple sclerosis. Eye. 2016;30(3):456–462. pmid:26681148
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Bussel II, Wollstein G, Schuman JS. OCT for glaucoma diagnosis, screening and detection of glaucoma progression. British Journal of Ophthalmology. 2014;98(Suppl 2):ii15–ii19. pmid:24357497
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Tan Y, Shen WD, Wu MY, Liu GN, Zhao SX, Chen Y, et al. Retinal layer segmentation in OCT images with boundary regression and feature polarization. IEEE Transactions on Medical Imaging. 2023;.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref5] 5. Ishikawa H, Stein DM, Wollstein G, Beaton S, Fujimoto JG, Schuman JS. Macular segmentation with optical coherence tomography. Investigative ophthalmology & visual science. 2005;46(6):2012–2017.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref6] 6. Lang A, Carass A, Hauser M, Sotirchos ES, Calabresi PA, Ying HS, et al. Retinal layer segmentation of macular OCT images using boundary classification. Biomedical optics express. 2013;4(7):1133–1152. pmid:23847738
View Article
PubMed/NCBI
Google Scholar

[20] View Article

[21] PubMed/NCBI

[22] Google Scholar

[ref7] 7. Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer; 2015. p. 234–241.

[ref8] 8. Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J. Unet++: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE transactions on medical imaging. 2019;39(6):1856–1867. pmid:31841402
View Article
PubMed/NCBI
Google Scholar

[25] View Article

[26] PubMed/NCBI

[27] Google Scholar

[ref9] 9. Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y, et al. Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:210204306. 2021;.

[ref10] 10. Roy AG, Conjeti S, Karri SPK, Sheet D, Katouzian A, Wachinger C, et al. ReLayNet: retinal layer and fluid segmentation of macular optical coherence tomography using fully convolutional networks. Biomedical optics express. 2017;8(8):3627–3642. pmid:28856040
View Article
PubMed/NCBI
Google Scholar

[30] View Article

[31] PubMed/NCBI

[32] Google Scholar

[ref11] 11. Dinsdale NK, Jenkinson M, Namburete AI. STAMP: Simultaneous Training and Model Pruning for low data regimes in medical image segmentation. Medical Image Analysis. 2022;81:102583. pmid:36037556
View Article
PubMed/NCBI
Google Scholar

[34] View Article

[35] PubMed/NCBI

[36] Google Scholar

[ref12] 12. Obaid KB, Zeebaree S, Ahmed OM, et al. Deep learning models based on image classification: a review. International Journal of Science and Business. 2020;4(11):75–81.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref13] 13. Zhao ZQ, Zheng P, Xu St, Wu X. Object detection with deep learning: A review. IEEE transactions on neural networks and learning systems. 2019;30(11):3212–3232. pmid:30703038
View Article
PubMed/NCBI
Google Scholar

[41] View Article

[42] PubMed/NCBI

[43] Google Scholar

[ref14] 14. Gregor K, Danihelka I, Graves A, Rezende D, Wierstra D. Draw: A recurrent neural network for image generation. In: International conference on machine learning. PMLR; 2015. p. 1462–1471.

[ref15] 15. Minaee S, Boykov Y, Porikli F, Plaza A, Kehtarnavaz N, Terzopoulos D. Image segmentation using deep learning: A survey. IEEE transactions on pattern analysis and machine intelligence. 2021;44(7):3523–3542.
View Article
Google Scholar

[46] View Article

[47] Google Scholar

[ref16] 16. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2015. p. 3431–3440.

[ref17] 17. Oktay O, Schlemper J, Folgoc LL, Lee M, Heinrich M, Misawa K, et al. Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:180403999. 2018;.

[ref18] 18. He Y, Carass A, Liu Y, Jedynak BM, Solomon SD, Saidha S, et al. Deep learning based topology guaranteed surface and MME segmentation of multiple sclerosis subjects from retinal OCT. Biomedical optics express. 2019;10(10):5042–5058. pmid:31646029
View Article
PubMed/NCBI
Google Scholar

[51] View Article

[52] PubMed/NCBI

[53] Google Scholar

[ref19] 19. He Y, Carass A, Liu Y, Jedynak BM, Solomon SD, Saidha S, et al. Structured layer surface segmentation for retina OCT using fully convolutional regression networks. Medical image analysis. 2021;68:101856. pmid:33260113
View Article
PubMed/NCBI
Google Scholar

[55] View Article

[56] PubMed/NCBI

[57] Google Scholar

[ref20] 20. Wang B, Wei W, Qiu S, Wang S, Li D, He H. Boundary aware U-Net for retinal layers segmentation in optical coherence tomography images. IEEE Journal of Biomedical and Health Informatics. 2021;25(8):3029–3040. pmid:33729959
View Article
PubMed/NCBI
Google Scholar

[59] View Article

[60] PubMed/NCBI

[61] Google Scholar

[ref21] 21. He K, Lian C, Zhang B, Zhang X, Cao X, Nie D, et al. HF-UNet: learning hierarchically inter-task relevance in multi-task U-net for accurate prostate segmentation in CT images. IEEE transactions on medical imaging. 2021;40(8):2118–2128. pmid:33848243
View Article
PubMed/NCBI
Google Scholar

[63] View Article

[64] PubMed/NCBI

[65] Google Scholar

[ref22] 22. Frankle J, Carbin M. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:180303635. 2018;.

[ref23] 23. Han S, Pool J, Tran J, Dally W. Learning both weights and connections for efficient neural network. Advances in neural information processing systems. 2015;28.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref24] 24. Park JJ, Florence P, Straub J, Newcombe R, Lovegrove S. Deepsdf: Learning continuous signed distance functions for shape representation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2019. p. 165–174.

[ref25] 25. Behrad F, Abadeh MS. Evolutionary convolutional neural network for efficient brain tumor segmentation and overall survival prediction. Expert Systems with Applications. 2023;213:118996.
View Article
Google Scholar

[72] View Article

[73] Google Scholar

[ref26] 26. Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2018. p. 7132–7141.

[ref27] 27. Zhu W, Chen X, Qiu P, Farazi M, Sotiras A, Razi A, et al. SelfReg-UNet: Self-Regularized UNet for Medical Image Segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2024. p. 601–611.

[ref28] 28. Diakogiannis FI, Waldner F, Caccetta P, Wu C. ResUNet-a: A deep learning framework for semantic segmentation of remotely sensed data. ISPRS Journal of Photogrammetry and Remote Sensing. 2020;162:94–114.
View Article
Google Scholar

[77] View Article

[78] Google Scholar

Figures

Abstract

1 Introduction

2 Related work

2.1 Single-task learning

2.2 Multi-task learning

2.3 Network pruning

3 Proposed model

3.1 Overall

3.2 Spatial attention-based dual-branch information fusion block

3.3 Truncated signed distance function for boundary regression

3.4 Model pruning with channel attention

4 Experiments setup

4.1 Datasets

4.2 Training and evaluation metrics

4.3 Training details

5 Experiments results and discussion

5.1 Segmentation results

5.2 Ablation experiments

5.2.1 Ablation experiment for double task.

5.2.2 Ablation experiment for boundary regression loss.

5.2.3 Ablation experiment for pruning.

5.2.4 The summary of ablation experiment.

6 Conclusion and discussion

Supporting information

S1 File.

S2 File.

Acknowledgments

References