Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

ACL-DUNet: A tumor segmentation method based on multiple attention and densely connected breast ultrasound images

  • Hao Zhang,

    Roles Formal analysis, Methodology, Writing – review & editing

    Affiliation School of Computer Science and Technology, Xinjiang University, Urumqi, Xinjiang, China

  • He Liang,

    Roles Funding acquisition

    Affiliation Department of Electronic Engineering, and Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China

  • Guo Wenjia,

    Roles Funding acquisition, Writing – review & editing

    Affiliation Cancer Institute, Affiliated Cancer Hospital of Xinjiang Medical University, Urumqi, Xinjiang, China

  • Ma Jing,

    Roles Funding acquisition, Writing – review & editing

    Affiliation School of Computer Science and Technology, Xinjiang University, Urumqi, Xinjiang, China

  • Sun Gang,

    Roles Funding acquisition, Writing – review & editing

    Affiliations Department of Breast and Thyroid Surgery, The Affiliated Cancer Hospital of Xinjiang Medical University, Urumqi, Xinjiang, P.R. China, Xinjiang Cancer Center/Key Laboratory of Oncology of Xinjiang Uyghur Autonomous Region, Urumqi, Xinjiang, P.R. China

  • Ma Hongbing

    Roles Funding acquisition, Writing – review & editing

    hbma@tsinghua.edu.cn

    Affiliation Department of Electronic Engineering, and Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China

Abstract

Breast cancer is the most common cancer in women. Breast masses are one of the distinctive signs for diagnosing breast cancer, and ultrasound is widely used for screening as a non-invasive and effective method for breast examination. In this study, we used the Mendeley and BUSI datasets, comprising 250 images (100 benign, 150 malignant) and 780 images (133 normal, 487 benign, 210 malignant), respectively. The datasets were split into 80% for training and 20% for validation. The accurate measurement and characterization of different breast tumors play a crucial role in guiding clinical decision-making. The area and shape of the different breast tumors detected are critical for clinicians to make accurate diagnostic decisions. In this study, a deep learning method for mass segmentation in breast ultrasound images is proposed, which uses densely connected U-net with attention gates (AGs) as well as channel attention modules and scale attention modules for accurate breast tumor segmentation.The densely connected network is employed in the encoding stage to enhance the network’s feature extraction capabilities. Three attention modules are integrated in the decoding stage to better capture the most relevant features. After validation on the Mendeley and BUSI datasets, the experimental results demonstrate that our method achieves a Dice Similarity Coefficient (DSC) of 0.8764 and 0.8313, respectively, outperforming other deep learning approaches. The source code is located at github.com/zhanghaoCV/plos-one.

1 Introduction

Breast cancer is the most common cancer type in women and one of the leading causes of female mortality globally [1, 2]. According to data from the World Health Organization (WHO), breast cancer has the highest incidence and mortality rates among female cancers worldwide. Early diagnosis and accurate treatment are crucial for achieving favorable prognosis and survival rates for breast cancer patients [3]. Currently, ultrasound imaging is a well-established tool for early assessment and diagnosis of abnormal breast conditions due to its non-invasive imaging procedure, high sensitivity, and cost-effectiveness [4, 5], Therefore, it is widely used for breast cancer screening. But we also need to know that it is primarily used as an adjunct to mammography rather than a standalone screening tool. This is because mammography is the standard screening method, and breast ultrasound is typically used to verify mammographic findings or guide biopsies, especially in patients with dense breast tissue. Therefore, Breast ultrasound is particularly beneficial for patients with dense breast tissue, as it can provide clearer images than mammography. However, due to the complex structure of the breast, standard breast ultrasound imaging is associated with high inter-reader variability and false positives, radiologists sometimes struggle to accurately locate lesions, and the heavy workload can lead to missed diagnoses and misdiagnoses, leading to unnecessary biopsies. For the rapid and effective screening of breast cancer, there is a critical Need for automated, accurate breast tumor segmentation methods to assist radiologists in diagnosing and treating breast cancer. Therefore, Computer-Aided Diagnosis (CAD) technology has emerged, bringing about significant technological breakthroughs in medical diagnosis [6] and reducing the workload of physicians. Breast masses are one of the most distinctive manifestations for diagnosing breast cancer, and their edge information reflects their growth patterns and biological characteristics. Generally, benign masses exhibit regular shapes, while irregular margins are often associated with malignancy. In other words, the accuracy of mass segmentation directly impacts the benign or malignant classification of the masses. Therefore, precise segmentation of breast tumors is of paramount importance for the classification of ultrasound images into benign or malignant categories.

Methods for breast tumor segmentation can be categorized into traditional methods and deep learning methods. Many traditional methods have been proposed for accurate breast tumor segmentation, such as active contour-based methods [7, 8], threshold-based segmentation methods [9, 10], and graph-based segmentation methods [11, 12]. In previous literature [1315], machine learning techniques like logistic regression, random forest, and feedforward neural networks have been applied for the diagnosis of gastric cancer and breast cancer-related diseases. These traditional methods are straightforward but require extensive domain knowledge and expertise to extract color, shape, and texture features. Additionally, these methods are sensitive to noise and tend to result in over-segmentation. In recent years, due to the ever-increasing computing power and the ever-increasing amount of data available, deep learning has made significant progress in the field of medical image [1622]. Especially, Convolutional Neural Network (CNN) can capture the nonlinear mapping between input and output, and automatically learn local area features and high-level abstract features through multi-layer network structures which are usually better than manual extraction and predefined feature sets. Based on deep learning, especially fully convolutional network (FCN) [23] and U-Net [24], have been successfully applied to this field and achieve outstanding performance when compared with conventional approaches. For example, Yap et al. [25] developed several FCN-based variants for the semantic segmentation of breast lesion in BUS images. Hu et al. [26] investigated the effectiveness of Fully Convolutional Networks (FCNs) and U-Net for breast mass segmentation. Almajalid et al. [27] modified and improved U-Net for lesion segmentation based on the contrast-enhanced and speckle-reduced BUS images. However, pattern complexity and intensity similarity between the surrounding tissues (i.e., background) and lesion regions (i.e., foreground) increase the difficulty in lesion segmentation [28], Therefore, the network segmentation performance is required. However, many U-Net and its various variants’ basic architectures now consist of an encoder-decoder structure with a fixed receptive field. Due to the sequential optimization strategy of multi-scale features, the classical U-Net is experiencing the vanishing gradient problem with different optimization issues. Furthermore, the contextual information from multiple scales is not properly converged into the reconstruction of segmentation masks [29]. This might not fully exploit the fusion of coarse-to-fine features from both the encoder and decoder. Moreover, the structures and edges of lesions in BUS images are often blurry, which poses challenges in learning the information related to lesion structures and edges, leading to a decrease in algorithm performance [30].

In order to solve the above problems, we propose a novel variant of the U-Net model, ACL-DUNet, which incorporates dense connections and multiple attention mechanisms (Attention Gates, Channel Attention, and Scale Attention) to improve segmentation accuracy. Our model aims to address the limitations of previous methods by enhancing feature extraction and focusing on relevant regions in the image., the model mainly makes the following three contributions:

• A novel variant of the U-Net model for breast tumor segmentation is proposed:

We have adopted a densely connected CNN structure [31] as the encoder part. This allows each feature extraction layer to receive inputs directly from the feature maps of all previous layers. Not only does this method enable feature reuse, but it also allows for the extraction of breast tumor features of various shapes without adding extra parameters. Moreover, this structure helps to mitigate the problem of vanishing gradients, enhancing the model’s stability and training efficiency.

• Multiple attention mechanisms are integrated in the decoder part:

In the decoder part, we have integrated a CNN with Attention Gates (AGs) [32], Channel Attention [33, 34], and Scale Attention [35]. It has been demonstrated that saliency maps can provide clues/priors by highlighting visually significant regions or objects in images, thereby improving the network’s segmentation performance. For instance, Vakanski et al. [36] proposed integrating the visual saliency of lesion areas into the network, where saliency is introduced through several attention modules. The integration of these attention modules helps achieve more precise segmentation results. During the training process, Attention Gates (AGs) utilize spatial attention to enhance regions of interest on the feature map while suppressing potential background or irrelevant areas. This strengthens the relationships between pixels, enabling the network to better focus on the segmentation target [32]. Channel attention is used within the network to calibrate the cascade of low-level and high-level features, allowing more relevant channels to obtain higher coefficient weights. Feature channels in the encoder primarily contain low-level information, while those in the decoder carry more semantic information. Therefore, they may have different levels of importance for the segmentation task. To better utilize the most informative feature channels, we introduce channel attention to automatically highlight relevant feature channels while suppressing irrelevant ones. Scale Attention Modules are used to capture feature maps at different scales in the backbone network. To better handle objects of varying scales, a reasonable approach is to combine these features for final prediction. However, for a given object, feature maps at different scales might have varying degrees of relevance. The ideal practice is to automatically determine scale-specific weights for each pixel, allowing the network to adapt to the appropriate scale of the given input. Thus, we employ Scale Attention Modules, which automatically learn image-specific weights for each scale to calibrate features at different scales. This module is used at the end of the network [35].

• Performance enhancement and effect validation:

By combining densely connected structures and various attention mechanisms, the newly proposed network architecture outperforms the basic U-Net and other existing methods in the task of breast tumor segmentation. Furthermore, In this study, we evaluated our proposed method on two publicly available breast ultrasound datasets (Mendeley and BUSI) and demonstrated its superior performance compared to existing approaches. validating the effectiveness of the new architecture.

The remainder of this article is organized as follows: In Section 2, the datasets used in this study will be introduced, and a detailed description of the proposed ACL-DUNet will be provided. In Section 3, the experimental setup and results will be presented and analyzed. The discussion and conclusion will be presented in Sections 4 and 5, respectively.

2 Method

2.1 Network structure

The datasets used in this work have the following characteristics: a) Different breast tumors exhibit significant variations in size and shape; b) The size of the tumors is relatively small compared to the background. Considering these characteristics, we propose a method based on the encoder-decoder architecture, where the encoder employs a densely connected network, and the decoder incorporates Attention Gates (AGs), Channel Attention (CA), and Scale Attention (LA). The details of using these methods are explained later. The complete structure of the proposed ACL-DUNet is illustrated in Fig 1.

2.2 Dense connection networks

Firstly, breast tumors often exhibit variations in shape and size, and the limited availability of training images makes it challenging to segment breast tumors of different shapes and sizes on a small dataset. To address this issue, we adopt the dense connection proposed by G. Huang et al. [31]. This approach connects each layer in a feed-forward manner to all other layers. It means that each layer receives additional inputs from all preceding layers and passes its feature maps to all subsequent layers. This dense connection scheme enables feature reuse, allowing the network to better utilize features and achieve higher segmentation accuracy for small targets with significant scale variations. Additionally, it improves gradient backpropagation, making the network easier to train since each layer can directly receive the error signal from the final output, enabling implicit "deep supervision." Traditional CNN only connects the output feature map of (l−1)th layer as input to lth layer. While the lth layer of a densely connected network receives the characteristic map x0,x1,…,xl of all previous layers, as input: (1) where [x0,x1,…,xl ] represents the concatenation of the feature maps generates in layers 0,…, l − 1. And Hl(⋅) is defined as a composite function of three continuous operations: Batch Normalization (BN), ReLU and 3×3 convolution (Conv).

2.3 Attention module

Humans possess a unique visual attention mechanism that allocates more attention to target regions and suppresses irrelevant or distracting information. Therefore, incorporating this attention mechanism in breast tumor segmentation can help the network identify smaller tumor regions in breast ultrasound images compared to the background. The AGs proposed by Oktay [32] fit this attention mechanism, and its main goal is to find the most important information of the current task from a large amount of information, while improving the performance of the network by suppressing task-independent features. The structure of AGs can be seen in Fig 2. x is the feature map input from the encoder, and g is the feature map from upsampling. α is the attention factor generated by the network. The output of AGs is an element-by-element multiplication of the two, as follows: (2)

If it is in the case of multisemantics, it is necessary to learn the multidimensional attention coefficient.

Our method uses additive attention instead of multiplicative attention to obtain the gate coefficients. Although additive attention introduces some computational overhead, it has been shown to achieve higher accuracy compared to multiplicative attention. The formula for additive attention is as follows: (3) σ1 is often chosen as ReLU function: σ1(m) = max(0,m), and σ2 is the Sigmoid function:, Wx, Wg and ψ are linear transformations, bg and bψ are bias terms. Parameters of AGs use a normal distribution initialization method and are updated according to the backpropagation principle.

In our network, unlike the previous SE block [33] that only used average pooling, we added a max pooling operation to preserve more information [34]. This Channel Attention (CA) module receives both low-level features from the encoder calibrated by AGs and high-level features from the decoder. The feature channels from the encoder contain more fine-grained information, while the feature channels from the decoder contain more coarse-grained information. To better utilize the useful feature channels, we introduced this Channel Attention module. The details of this module are shown in Fig 3.

thumbnail
Fig 3. Structure of our proposed channel attention module with residual connection.

https://doi.org/10.1371/journal.pone.0307916.g003

In our setting, let x represent the concatenated input feature map with C channels, a global average pooling and a global maximal pooling are first used to obtain the global information of each channel, respectively. A multiple layer perception (MLP) is used to obtain the channel attention coefficient β∈[0,1]C×1×1. The MLP consists of two fully connected layers.

The output of the first layer is C/r, followed by a ReLU activation function. The second layer has an output channel of C. The two results from the MLP are summed and then passed through a Sigmoid activation function to obtain β.Our channel attention module output is: (4)

The backbone of our network produces feature maps of different scales, and in order to deal with objects at different scales, these feature maps of different scales can be combined and predicted. Ran Gu et al. [35] proposed a new scale attention (LA) module on top of the channel attention module. This module automatically determines the proportional weight of each pixel to calibrate features at different scales so that the network can adapt to the corresponding proportion of a given input image. The structure of the scale attention module is shown in Fig 4.

thumbnail
Fig 4. Structure of our proposed scale attention module with residual connection.

https://doi.org/10.1371/journal.pone.0307916.g004

First of all, the decoder of the network has four layers, so there will be feature maps of four scale sizes, and we use bilinear interpolation to restore these four sizes of feature maps to be as large as the original input image. In order to reduce the amount of computation, a 1×1 convolutional layer is used to compress the feature map into 4 channels, and then the results of compression at different scales are combined into a 16-channel hybrid feature map F. Similar to the CA module, the scale attention coefficient is expressed as γ∈[0,1]4×1×1, In order to assign attention weights on pixels, spatial attention blocks are used Fγ as input to generate spatial directional attention coefficients γ*∈[0,1]H×W. So γγ* represents attention on pixels. The final output of the LA module is: (5)

3 Experimental results

In this section, we begin by presenting the experimental setup, including parameter configurations, evaluation strategies and datasets. Subsequently, we provide a detailed account of the experimental results obtained from different methods on two distinct breast ultrasound datasets to validate the effectiveness of the proposed approach.

3.1 Dataset description

• Mendeley (Paulo Sergio Rodrigues, 2017) Ultrasound dataset [29, 37] includes 100 benign images and 150 images of malignant cancer. The native resolution of the ultrasound image is 64×64 pixels, which is converted to 128×128 pixels. The dataset is essentially classification-based and does not provide ground truth imagery. Therefore, with the help of experienced radiologists, benign and malignant tumor images are annotated for the model training process. The dataset image is shown in Fig 5.

• The BUSI dataset was provided by Bahya Hospital in Cairo, Egypt (Al Dhabyani et al., 2020) [38]. All ultrasound images were acquired by the LGIQ E9 ultrasound and LGIQ E9 Agile ultrasound systems. The dataset includes images of 133 normal, 487 benign, 210 malignant, for a total of 780 patients. Since the other datasets used in our study included only one breast mass per image, to make later performance comparisons more straightforward, we removed the US images from the BUSI dataset that included multiple breast masses. This modification resulted in 630 US images corresponding to 421 benign and 209 malignant breast masses. Live ground imagery is also provided by radiologists with an image resolution of 1280 × 1024. And the nearest neighbor interpolation method is used to scale the image to 256*256 size. The dataset image is shown in Fig 5.

thumbnail
Fig 5. Mendeley dataset images and BUSI dataset benign and malignant pictures.

https://doi.org/10.1371/journal.pone.0307916.g005

3.2 Experimental setup

The method was implemented using Python 3.6 and the PyTorch deep learning library on an Ubuntu 18.04 system with an NVIDIA TESLA T4 GPU. The images from both datasets were provided in PNG format. For experimentation purposes, the entire dataset was divided into a training dataset (80%) and a testing dataset (20%). And the training process utilized the Adam optimizer with a momentum of 0.9. All parameters were initialized using the "he_normal" method [39]. The initial values for the batch size, learning rate, and number of iterations were set to 8, 0.0001, and 200, respectively. The loss function employed for training was the cross-entropy loss, which calculates the difference between predicted values and ground truth labels. The loss is then propagated through backpropagation to update the weights and biases of the network’s layers.

Before conducting relevant experiments, we also performed a series of data set preprocessing. First, we performed data cleaning, manually checked the two data sets, and eliminated some data that did not meet the requirements. Secondly, the input data was scaled and flipped horizontally and vertically according to probability. Finally, the data was standardized and the mean and standard deviation were set.

3.3 Evaluation measures

In this work, we used five metrics to evaluate the performance of the model, namely Dice Similarity Coefficient (DSC), Jaccard Similarity Coefficient (JSC), Positive Predictive Value (PPV), Sensitivity (SEN), and F1-score. The definitions of all evaluation metrics are as follows: (6) (7) (8) (9) (10)

In this context, TP (True Positive) and FP (False Positive) represent the number of correctly predicted tumor pixels inside and outside the ground truth tumor region, respectively. FN (False Negative) and TN (True Negative) represent the number of incorrectly predicted background pixels inside and outside the ground truth tumor region, respectively. Furthermore, all evaluation metrics, including DSC, JSC, PPV, SEN, and F1-score, have a range of 0 to 1, where values closer to 1 indicate better performance. A value of 1 indicates perfect segmentation, meaning the model’s predictions match the ground truth completely, while a value of 0 indicates poor segmentation, indicating no overlap between the model’s predictions and the ground truth.

3.4 Experimental results

3.4.1 Ablation studies.

Table 1 shows the results of the networks with and without Dense Connections (DC) and the three attention modules (ACL) on the Mendeley dataset. On the original dataset, the network with only DC achieved average results of DSC (0.8270), JSC (0.7169), PPV (0.9255), SEN (0.7613), and F1 (0.8354). The network with only the ACL modules achieved average results of DSC (0.8646), JSC (0.7669), PPV (0.9339), SEN (0.8109), and F1 (0.8680). In contrast, the ACL-DUNet, which combines DC and ACL, achieved the following average scores on each evaluation metric: DSC (0.8764), JSC (0.7807), PPV (0.9482), SEN (0.8152), and F1 (0.8767). Compared to the network with only DC, ACL-DUNet showed significant improvements in DSC (4.94%), JSC (6.38%), PPV (2.27%), SEN (5.39%), and F1 (4.13%). Moreover, when compared to the network with only ACL, ACL-DUNet also demonstrated improvements in DSC (1.18%), JSC (1.38%), PPV (1.43%), SEN (0.43%), and F1 (0.87%) on all five evaluation metrics. The segmentation results on this dataset are shown in Fig 6.

thumbnail
Fig 6. In the ablation experiment, Mendeley data segmentation plot.

Blue is Ground Truth and pink is Prediction segmentation.

https://doi.org/10.1371/journal.pone.0307916.g006

Table 2 presents the results of the networks with and without Dense Connections (DC) and the three attention modules (ACL) on the BUSI dataset. On the BUSI dataset, the network with only DC achieved average results of DSC (0.7495), JSC (0.6569), PPV (0.8001), SEN (0.7564), and F1 (0.7776). The network with only the ACL modules achieved average results of DSC (0.8111), JSC (0.7152), PPV (0.8157), SEN (0.8660), and F1 (0.8401). In contrast, the ACL-DUNet, which combines DC and ACL, achieved the following average scores on each evaluation metric: DSC (0.8313), JSC (0.7344), PPV (0.8292), SEN (0.8703), and F1 (0.8492). Compared to the network with only DC, ACL-DUNet showed improvements in DSC (8.18%), JSC (7.75%), PPV (2.91%), SEN (11.39%), and F1 (7.16%). Moreover, when compared to the network with only ACL, ACL-DUNet also demonstrated improvements in DSC (2.02%), JSC (1.92%), PPV (1.35%), SEN (0.43%), and F1 (0.91%) on all five evaluation metrics. The segmentation results on this dataset are shown in Fig 7.

thumbnail
Fig 7. In the ablation experiment, BUSI data segmentation plot.

Blue is Ground Truth and pink is Prediction segmentation.

https://doi.org/10.1371/journal.pone.0307916.g007

Some challenging visual results are depicted in Figs 6 and 7. Fig 6 shows the results of the ablation experiment on the Mendeley dataset. The first row contains three segmentation images from ACL-DUNet with both Dense Connections (DC) and Attention modules (ACL). The second row shows the segmentation images from ACL-DUNet with only ACL modules, and the third row displays the segmentation images from ACL-DUNet with only DC. The blue lines represent the Ground truth, and the pink lines represent the Predicted Segmentation. It can be observed that in the first row, the second and third images exhibit the best segmentation performance among the ablation experiments, while the first image performs similarly to the second row’s images. On the other hand, the segmentation results of ACL-DUNet with only DC (third row) consistently perform the worst, which aligns with the findings in Table 1. Fig 7 presents the results of the ablation experiment on the BUSI dataset. Similarly, the first row contains three segmentation images from ACL-DUNet with both DC and ACL, the second row shows the segmentation images from ACL-DUNet with only ACL modules, and the third row displays the segmentation images from ACL-DUNet with only DC. The blue lines represent the Ground Truth, and the pink lines represent the Predicted Segmentation. It can be observed that the first and second images in the first row show excellent segmentation performance, closely approximating the Ground Truth. The second image slightly exhibits over-segmentation, but the overall performance is still relatively close to the Ground Truth, especially in the surrounding areas of the tumor segmentation. However, the segmentation results for malignant tumors (third column) are not satisfactory in all cases. These results provide visual evidence that the proposed ACL-DUNet with both Dense Connections and Attention modules achieves superior segmentation performance on both datasets compared to the ablation experiments with only one of the components.

3.4.2 Comparison with other methods.

Table 3 presents the performance of ACL-DUNet on the Mendeley dataset compared to other state-of-the-art methods [24, 29, 32, 40, 41]. It can be observed that our network outperforms all the compared methods in all five evaluation metrics. On the Mendeley dataset, ACL-DUNet achieves a 5.35% improvement in DSC measurement, a 6.83% improvement in JSC measurement, a 4.88% improvement in PPV measurement, a 4.29% improvement in SEN measurement, and a 4.57% improvement in F1 measurement compared to U-Net. The second-best results are observed in PDF-Unet, with a DSC measurement of 0.8574, a JSC measurement of 0.7594, a SEN measurement of 0.7946, and an F1 measurement of 0.8638. Additionally, the second-best PPV measurement is achieved by the AttU-Net, with a PPV measurement of 0.9466. On the other hand, the worst-performing network in all five metrics is U-Net++, with measurements of DSC (0.8118), JSC (0.7001), PPV (0.8747), SEN (0.7700), and F1 (0.8190). The segmentation results of these methods on images can be seen in Fig 8.

thumbnail
Fig 8. On the Mendeley dataset, our method is a segmentation plot of the method versus the contrast method.

https://doi.org/10.1371/journal.pone.0307916.g008

Table 4 illustrates the performance of ACL-DUNet on the BUSI dataset compared to other state-of-the-art methods. It can be observed that our network outperforms all the compared methods in overall performance. On the BUSI dataset, ACL-DUNet achieves a 9.31% improvement in DSC measurement, a 10.06% improvement in JSC measurement, a 13.38% improvement in SEN measurement, and a 6.81% improvement in F1 measurement compared to U-Net. Although there is a slight 0.22% decrease in PPV, considering the improvement in F1, it falls within an acceptable range. The second-best overall results are observed in MSU-Net, with a DSC measurement of 0.8055, a JSC measurement of 0.7175, a PPV measurement of 0.8573, a SEN measurement of 0.8140, and an F1 measurement of 0.8351. Additionally, MSU-Net achieves the best PPV measurement among all six networks, with a PPV measurement of 0.8573. On the other hand, U-Net performs the worst in overall performance among the five metrics, with measurements of DSC (0.7382), JSC (0.6338), PPV (0.8314), SEN (0.7365), and F1 (0.7811). The segmentation results of these methods on images can be seen in Fig 9.

thumbnail
Fig 9. On the BUSI dataset, our method is a segmentation plot of the method versus the contrast method.

https://doi.org/10.1371/journal.pone.0307916.g009

Some challenging visual results are depicted in Figs 8 and 9. Fig 8 shows the segmentation results of the ablation experiment on the Mendeley dataset. The blue lines represent the Ground Truth, and the pink lines represent the Predicted Segmentation. It can be observed that the segmentation results of all models in the first row are quite good. In the second row, there is a noticeable difference in segmentation between the first two images, while the following four images have similar performance, with PDF-Unet showing smoother predicted lines. In the third row, the segmentation results of ACL-DUNet are not as prominent, indicating its superior performance. Fig 9 displays the segmentation results of the ablation experiment on the BUSI dataset. Similarly, in the first row, there is not much difference in performance among all models, but MSU-Net and ACL-DUNet have closer alignment between the blue and pink lines. In the second row, ACL-DUNet shows almost complete overlap with the Ground Truth, indicating its superior performance, followed by PDF-Unet. In the third row, the segmentation of malignant tumors is challenging, and none of the models achieve complete segmentation.

4 Discussion

Breast cancer is a major cause of cancer-related death among women worldwide, and the automatic segmentation of breast masses in ultrasound images is of significant clinical importance as it helps doctors detect early signs of breast cancer. In the task of breast tumor segmentation, the targets may exhibit substantial variations in position, texture, shape, and scale, making the awareness of spatial sizes and positions crucial for any network to achieve accurate segmentation. In this study, we explored a novel approach for breast ultrasound image segmentation, referred to as ACL-DUNet. This method combines the U-Net architecture with dense connections, attention gates (AGs), channel attention, and scale attention. The dense connected CNN structure is utilized in the down-sampling stage of the network as a feature extractor, allowing for feature reuse and extraction of various shapes of breast tumor features without increasing the number of parameters. It also helps alleviate the vanishing gradient problem. The attention gates (AGs), channel attention, and scale attention are placed in the decoder part of the network. These attention modules aid in obtaining more accurate segmentation results. AGs enhance the regions of interest on the feature maps while suppressing potential background or irrelevant regions. Channel attention is used to calibrate the concatenation of low-level and high-level features in the network, giving higher weights to more relevant channels and highlighting the most useful feature channels while suppressing irrelevant ones. The scale attention aims to better integrate the original semantic predictions obtained from the decoder, and therefore, it is placed at the end of the network.

In the evaluation of the two public datasets, our proposed ACL-DUNet achieved excellent segmentation scores. In the ablation study, the network combining Dense Connection (DC) and ACL outperformed networks with only DC or ACL, ranking first in all five evaluation metrics. Tables 3 and 4 present a comparison between our ACL-DUNet and other research methods on the Mendeley and BUSI datasets. On the Mendeley dataset, our model achieved the highest scores in all five evaluation metrics. Similarly, on the BUSI dataset, our method outperformed others in DSC, JSC, SEN, and F1 metrics. The overall results indicate that our proposed approach can effectively handle breast ultrasound tumor segmentation tasks, enabling clear diagnosis of breast cancer with the highest accuracy.

Despite achieving good performance in this study, there are still some areas for improvement and optimization in the ACL-DUNet. For instance, optimizing the dense connection network to enhance the training speed would be beneficial. Additionally, leveraging post-processing methods, such as Conditional Random Fields, could further improve the automatic segmentation results.

This study also has certain limitations. Firstly, there are limitations related to the datasets used; the model was trained and tested on two relatively small breast ultrasound datasets. Small-scale datasets may limit the model’s ability to generalize to larger or more diverse data sets. Although the model performs well on these small datasets, it may experience a decline in performance in actual clinical applications. Secondly, there is insufficient data diversity; the datasets used may not cover all types of breast tumors, especially rare or atypical tumors. This could lead to insufficient model recognition capabilities for unseen tumor types. At last, the model was not validated in real-time image with uncontrolled settings, model not tested in real-world situation or compared with fellowship trained or experienced annotator restricting its widespread applicability, and source of training biases.

Next, there are limitations regarding the model structure. Firstly, the introduction of multiple attention mechanisms (spatial, channel, and scale attentions) has improved segmentation accuracy but has also increased the model’s complexity and computational cost. Deploying such models may be impractical in resource-constrained environments (such as certain clinical devices). Secondly, there are flexibility constraints; the optimization and performance of the model heavily rely on a specific network architecture (such as variants of U-Net). This design may have limited adaptability when faced with new problems that require different architectural features.

Additionally, there are limitations related to hyperparameters and training details. Firstly, model performance depends on the setting of hyperparameters, such as learning rate, batch size, and regularization coefficients. Optimal values for these parameters are typically obtained through repeated experimentation and may no longer be optimal on different datasets or with updated data. Secondly, despite the use of batch normalization and attention mechanisms, the training process of the model might still face issues with instability and difficulty in convergence, especially in cases with large parameter spaces and multiple layers.

Lastly, there are practical challenges in the technological application. Although the model performs well on public datasets, it lacks extensive validation in actual clinical settings. Differences in breast ultrasound images due to varying equipment and operating conditions may lead to decreased model performance.

5 Conclusion

In this study, we propose a densely connected U-Net model that integrates three different attention mechanisms: spatial attention gates, channel attention, and scale attention, for precise segmentation of tumors in breast ultrasound images. Through extensive testing on two publicly available breast ultrasound datasets, our approach has demonstrated the following major achievements and breakthroughs:

1. Improvement in Segmentation Accuracy: Our model has achieved superior segmentation accuracy on the Mendeley and BUSI datasets compared to existing state-of-the-art methods. Specifically, compared to traditional U-Net and other benchmark models, ACL-DUNet has shown improvements in both the Dice Similarity Coefficient (DSC) and Jaccard Similarity Index (JSC), demonstrating its outstanding performance.

2. Effective Integration of Multiple Attention Mechanisms: By integrating spatial attention gates, channel attention, and scale attention, the model more finely captures and utilizes spatial, channel, and scale information of images, enhancing the distinction between tumor regions and the background. This integrated approach not only boosts the model’s feature extraction capabilities but also optimizes the flow of information across the network.

3. Robustness and Generalization Capability: ACL-DUNet has proven its efficiency and accuracy in handling tumors of various sizes and shapes, maintaining high segmentation performance even in cases of variable image quality and indistinct tumor presentations. Additionally, the model has demonstrated good generalization across two different datasets, indicating its adaptability to various clinical settings and scanning equipment.

4. Computational Efficiency: Despite the model’s complex structure, through algorithm optimization and network design, ACL-DUNet achieves high accuracy while maintaining relatively fast computation speeds. This is particularly important for clinical applications as it supports rapid diagnostics, helping to improve medical efficiency.

5. Facilitating Clinical Diagnosis and Early Intervention: By providing high-precision tumor segmentation, this model can assist radiologists in accurately identifying and classifying breast tumors at early stages, thereby improving the diagnostic and treatment processes for patients. This is crucial for enhancing the survival rates and quality of life of breast cancer patients.

In summary, our study proposes a densely connected U-Net model with integrated attention mechanisms for precise segmentation of breast tumors in ultrasound images. Through extensive testing on two publicly available datasets, our model has demonstrated significant improvements in segmentation accuracy, feature extraction, and computational efficiency.

However, future research should focus on validating the model on larger and more diverse datasets, simplifying the model structure, optimizing training stability, and conducting extensive clinical validations. By addressing these limitations, our model can further enhance its applicability and effectiveness in clinical settings, ultimately improving breast cancer diagnosis and treatment.

References

  1. 1. Yap M. H., Pons G., Marti J., Ganau S., Sentis M., Zwiggelaar R. et al., “Automated breast ultrasound lesions detection using convolutional neural networks,” IEEE Journal of Biomedical and Health Informatics, vol. 22, no. 4, pp. 1218–1226, Jul. 2018. pmid:28796627
  2. 2. Cheng H., Shi X., Min R., Hu L., Cai X., and Du H., “Approaches for automated detection and classification of masses in mammograms.” Pattern Recognition, vol. 39, no. 4, pp. 646–668, Jan. 2006.
  3. 3. Bleicher R. J. et al., “Time to surgery and breast cancer survival in the united states,” JAMA Oncol., vol. 2, no. 3, pp. 330–339, 2016. pmid:26659430
  4. 4. Xian M., Zhang Y., Cheng H. D., Xu F., Zhang B., and Ding J., “Automatic breast ultrasound image segmentation: A survey,” Pattern Recognit., vol. 79, pp. 340–355, Jul. 2018.
  5. 5. Noble J. A. and Boukerroui D., “Ultrasound image segmentation: Asurvey,” IEEE Trans. Med. Imag., vol. 25, no. 8, pp. 987–1010,Aug. 2006.
  6. 6. Dong M., Lu X., Ma Y., Guo Y., Ma Y., and Wang K., ‘‘An efficient approach for automated mass segmentation and classification in mammo-grams,” J. Digit. Imag., vol. 28, no. 5, pp. 613–625, Mar. 2015.
  7. 7. Huang, Yu-Len, & Chen, Dar-Ren (2005). Automatic Contouring for Breast Tumors in 2-DSonography. In. 2005 IEEE Engineering in Medicine and Biology 27th AnnualConference, 7 VOLS (pp. 3225–3228). https://doi.org/10.1109/IEMBS.2005.1617163.
  8. 8. Liu B., Cheng H. D., Huang J., Tian J., Liu J., & Tang X. (2009). Automatedsegmentation of ultrasonic breast lesions using statistical texture classification andactive contour based on probability distance. Ultrasound in Medicine & Biology, 35(8),1309–1324. https://doi.org/10.1016/j.ultrasmedbio.2008.12.007
  9. 9. Mughal B., Muhammad N., & Sharif M. (2019). Adaptive hysteresis thresholding segmentation technique for localizing the breast masses in the curve stitching domain. International Journal of Medical Informatics, 126(June 2018), 26–34. pmid:31029261
  10. 10. Aja-Fernández S., Curiale A. H., & Vegas-Sánchez-Ferrero G. (2015). A local fuzzythresholding methodology for multiregion image segmentation. Knowledge-BasedSystems, 83(1), 1–12. https://doi.org/10.1016/j.knosys.2015.02.029
  11. 11. McClymont D., Mehnert A., Trakic A., Kennedy D., & Crozier S. (2014). Fully automatic lesion segmentation in breast MRI using mean-shift and graph-cuts on a region adjacency graph. Journal of Magnetic Resonance Imaging, 39(4), 795–804. pmid:24783238
  12. 12. Daoud M. I., Atallah A. A., Awwad F., Al-Najjar M., & Alazrai R. (2019). Automatic superpixel-based segmentation method for breast ultrasound images. Expert Systems with Applications, 121, 78–96. https://doi.org/10.1016/j.eswa.2018.11.024
  13. 13. Gharaei A., Amjadian A., & Shavandi A. (2021). An integrated reliable four-level supply chain with multi-stage products under shortage and stochastic constraints.International Journal of Systems Science: Operations & Logistics, 1–22. https://doi.org/10.1080/23302674.2021.1958023
  14. 14. Baradaran Rezaei H., Amjadian A., Sebt M. V., Askari R., & Gharaei A. (2022). An ensemble method of the machine learning to prognosticate the gastric cancer. Annals of Operations Research. https://doi.org/10.1007/s10479-022-04964-1
  15. 15. Amjadian A., & Gharaei A. (2022). An integrated reliable five-level closed-loop supply chain with multi-stage products under quality control and green policies: Generalised outer approximation with exact penalty. International Journal of Systems Science: Operations & Logistics, 9(3), 429–449. https://doi.org/10.1080/23302674.2021.1919336
  16. 16. Dhungel N., Carneiro G., and Bradley A. P., ‘‘Deep learning and structured prediction for the segmentation of mass in mammograms,” in Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent., Berlin, Germany,2015, pp. 605–612.
  17. 17. Yuan Y., Chao M., and Lo Y.-C., ‘‘Automatic skin lesion segmentation using deep fully convolutional networks with Jaccard distance,” IEEE Trans. Med. Imag., vol. 36, no. 9, pp. 1876–1886, Sep. 2017. pmid:28436853
  18. 18. Li X. et al., ‘‘3D multi-scale FCN with random modality voxel dropoutlearning for intervertebral disc localization and segmentation frommulti-modality MR images,” Med. Image Anal., vol. 45, pp. 41–54, Apr. 2018. pmid:29414435
  19. 19. Umehara K., Ota J., and Ishida T., ‘‘Application of super-resolution con-volutional neural network for enhancing image resolution in chest CT,”J. Digit. Imag., vol. 31, no. 4, pp. 441–450, Aug. 2017.
  20. 20. Mohamed A. A.,Berg W. A., Peng H., Luo Y., Jankowitz R. C. and Wu S., ‘‘A deep learning method for classifying mammographic breast density categories,” Med. Phys., vol. 45, no. 1, pp. 314–321, Jan. 2017. pmid:29159811
  21. 21. González G. et al., ‘‘Disease staging and prognosis in smokers using deep learning in chest computed tomography,” Amer. J. Respiratory Crit. Care Med., vol. 197, no. 2, pp. 194–203, Jan. 2018. pmid:28892454
  22. 22. Chougrad H., Zouaki H., and Alheyane O., ‘‘Deep convolutional neu-ral networks for breast cancer screening,” Comput. Methods Programs Biomed., vol. 157, pp. 19–30, Apr. 2018. pmid:29477427
  23. 23. Long Jonathan, Shelhamer Evan, and Darrell Trevor. "Fully convolutional networks for semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
  24. 24. Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III 18. Springer International Publishing, 2015.
  25. 25. Yap M. H. et al., “Breast ultrasound lesions recognition: End-to-end deep learning approaches,” J. Med. Imag., vol. 6, no. 1, Oct. 2018, Art. no. 011007. pmid:30310824
  26. 26. Hu Y., Guo Y., Wang Y., Yu J., Li J., Zhou S., Chang C., Automatic tumor segmentation in breast ultrasound images using a dilated fully convolutional network combined with an active contour model, Med. Phys. 46 (2019)215–228, pmid:30374980
  27. 27. Almajalid R., Shan J., Du Y., and Zhang M., “Development of a deep-learning-based method for breast ultrasound image segmentation,” inProc. Int. Conf. Mach. Learn. Appl. (ICMLA), 2018, pp. 1103–1108.
  28. 28. Shao H., Zhang Y., Xian M., Cheng H. D., Xu F., and Ding J., “A saliency model for automated tumor detection in breast ultra-sound images,” in Proc. Int. Conf. Image Process. (ICIP), Sep. 2015,pp. 1424–1428.
  29. 29. Iqbal Ahmed, Sharif Muhammad, “PDF-UNet: A semi-supervised method for segmentation of breast tumor images using a U-shaped pyramid-dilated network”, Expert Systems with Applications, Volume 221, 2023, 119718
  30. 30. Ning Zhenyuan, et al. "CF2-Net: Coarse-to-fine fusion convolutional network for breast ultrasound image segmentation." arXiv preprint arXiv:2003.10144 (2020).
  31. 31. Huang Gao, et al. "Densely connected convolutional networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
  32. 32. Oktay Ozan, et al. "Attention u-net: Learning where to look for the pancreas." arXiv preprint arXiv:1804.03999 (2018).
  33. 33. Hu Jie, Shen Li, and Sun Gang. "Squeeze-and-excitation networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
  34. 34. Woo Sanghyun, et al. "Cbam: Convolutional block attention module." Proceedings of the European conference on computer vision (ECCV). 2018.
  35. 35. Gu Ran, et al. "CA-Net: Comprehensive attention convolutional neural networks for explainable medical image segmentation." IEEE transactions on medical imaging 40.2 (2020): 699–711. pmid:33136540
  36. 36. Vakanski A., Xian M., and Freer P. E., “Attention-enriched deep learning model for breast tumor segmentation in ultrasound images,” Ultrasound Med. Biol., vol. 46, no. 10, pp. 2819–2833, Oct. 2020. pmid:32709519
  37. 37. Rodrigues Paulo Sergio (2017), "Breast Ultrasound Image", Mendeley Data, V1,
  38. 38. Al-Dhabyani Walid, et al. "Dataset of breast ultrasound images." Data in brief 28 (2020): 104863.
  39. 39. He K., Zhang X., Ren S. and Sun J., “Delving deep into rectifiers: surpassing human-level performance on ImageNet classification,” IEEE International Conference on Computer Vision (ICCV), pp. 1026–1034,2015.
  40. 40. Zhou, Zongwei, et al. "Unet++: A nested u-net architecture for medical image segmentation." Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4. Springer International Publishing, 2018.
  41. 41. Su R., et al. "Msu-net: Multi-scale u-net for 2d medical image segmentation. Front Genet 12: 58." (2021). pmid:33679900