Figures
Abstract
Manual image segmentation consumes time. An automatic and accurate method to segment multimodal brain tumors using context information rich three-dimensional medical images that can be used for clinical treatment decisions and surgical planning is required. However, it is a challenge to use deep learning to achieve accurate segmentation of medical images due to the diversity of tumors and the complex boundary interactions between sub-regions while limited computing resources hinder the construction of efficient neural networks. We propose a feature fusion module based on a hierarchical decoupling convolution network and an attention mechanism to improve the performance of network segmentation. We replaced the skip connections of U-shaped networks with a feature fusion module to solve the category imbalance problem, thus contributing to the segmentation of more complicated medical images. We introduced a global attention mechanism to further integrate the features learned by the encoder and explore the context information. The proposed method was evaluated for enhance tumor, whole tumor, and tumor core, achieving Dice similarity coefficient metrics of 0.775, 0.900, and 0.827, respectively, on the BraTS 2019 dataset and 0.800, 0.902, and 0.841, respectively on the BraTS 2018 dataset. The results show that our proposed method is inherently general and is a powerful tool for brain tumor image studies. Our code is available at: https://github.com/WSake/Feature-interaction-network-based-on-Hierarchical-Decoupled-Convolution.
Citation: Shen L, Zhang Y, Wang Q, Qin F, Sun D, Min H, et al. (2023) Feature interaction network based on hierarchical decoupled convolution for 3D medical image segmentation. PLoS ONE 18(7): e0288658. https://doi.org/10.1371/journal.pone.0288658
Editor: Gulistan Raja, University of Engineering & Technology, Taxila, PAKISTAN
Received: April 14, 2023; Accepted: June 30, 2023; Published: July 13, 2023
Copyright: © 2023 Shen et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: In this paper, we used the BraTS 2018 and BraTS 2019 datasets issued by the Center for Biomedical Image Computing and Analytics. Since data set acquisition requires registration and application, we promise not to upload data set to other public sites, therefore due to access problems, we can’t upload the complete data sets, but all the data are free, open, they can be found at https://ipp.cbica.upenn.edu/. More detailed information can be found at https://www.med.upenn.edu/cbica/. In addition, we uploaded some of our segmentation results, which you can find at DOI: 10.6084/m9.figshare.23542575 as proof of the authenticity of our experimental data. The details can be seen in Supporting information files.
Funding: Longfeng Shen received the University Synergy Innovation Program of Anhui Province, China (GXXT-2022-033), Anhui Provincial universities outstanding young backbone talents domestic visiting study and Research project (Grant No.gxgnfx2019006), the projects of Natural Science Foundation of Anhui Provincial Department of Education (Grant No: KJ2019A0603), and Open Laboratory project of Huaibei Normal University(Grant No: 2022sykf048). Dengdi Sun received the University Synergy Innovation Program of Anhui Province, China (GXXT-2022-002). Qianqian Meng received the projects of Natural Science Foundation of Anhui Provincial Department of Education (Grant No: KJ2020B13). Yingjie Zhang received Open Laboratory project of Huaibei Normal University(Grant No: 2021sykf027) and 2022 National Innovation and Entrepreneurship Training Program for College Students (202210373005). Qiong Wang received Open Laboratory project of Huaibei Normal University(Grant No: 2022sykf049). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
The purpose of medical image segmentation is to segment parts of medical images, extract related features, provide valuable information for quantitative evaluation of disease and formulation of treatment strategies, provide reliable assistance in pathological research and clinical diagnosis and deal with patient prognosis. Recently, deep learning has been widely used in the field of computer vision [1] and medical image processing.
Glioma [2] is a general term for tumors in the nervous system that originate from glial cells and neurons. As shown in Fig 1, it is the most common malignant tumor, accounting for 40%-50% of intracranial tumors. Glioma can be classified into astrocytoma, glioblastoma, oligodendroglioma, and so on, according to the type of glioma cells, which have different treatments and prognoses. Each type of glioma develops at a different age, with most younger patients having astrocytomas, middle-aged patients having glioblastoma multiforme, and children having myeloblastoma. These tumors have different shapes and sizes.
In this figure, green represents GDEnhancing Tumor (numerical label 2), yellow represents Pertumoral Edema (numerical label 1), and red represents Necrotic and Non-Enhancing Tumor Core(NCR/ECT, numerical label 4) showing the differences in texture, size and shape of primary brain tumours.
Accurate segmentation of brain tumors plays a vital role in early disease screening, evaluation of tumor progression and surgical treatment planning. However, the location and shape of lesions in different patients are quite different; experienced experts need to spend a lot of time and energy in marking the tumors manually. Automated segmentation methods can improve the efficiency of diagnosis and provide a visual representation of in vivo anatomy or function, which is essential for clinical analysis and medical intervention. However, this task faces the following challenges,
- The appearance, location, and shape of gliomas vary from patient to patient, making it difficult to accurately locate and segment smaller tumors.
- Brain tumors and normal tissues interpenetrate, making the borders blurred and indistinguishable.
- Imaging noise, low image contrast, unbalanced categories, and limited dataset sizes complicate this task.
Traditional segmentation methods of brain tumors are mainly based on random forest (RF) [3] classification, logistic regression (LR), and Markov random field (MRF) [4]. Most models are based on RF classifiers and the segmentation task is modeled as regularized stratified conditional random field (CRF), in which RF is used as a classifier. According to the local intensity information, each point in the brain is assigned a specific tissue category, and then these initial probability estimates are input into the radio frequency classifier together with the multimodal magnetic resonance imaging (MRI) to segment the brain tumor tissue. However, higher performance 3D convolutional neural networks (3D-CNNs) [5] emerged.
To improve the ability of deep convolutional neural networks in medical image segmentation, many attempts have been made. For example, the encoder-decoder structure has been improved to varying degrees, end-to-end learning [6] has been performed to maintain low-level features and obtain clear segmentation boundaries, atrous convolution [7] and multi-scale information effectively expands the receptive field, introduce attention learning mechanisms [8] in segmentation models, and make it possible to pay more attention to certain locations and channels. More novel methods add dual-stream pyramid module and context aware module [9] to the encoder decoder structure to avoid local feature loss. The attention mechanism is embedded in the convolution module to further refine the space and texture features [10], and make full use of the complementary advantages of three-dimension and two-dimensional convolution. For brain tumor segmentation, UNet [11] fully demonstrates the effectiveness of the U-shaped structure with outstanding results.
In terms of imaging methods, medical images are more diverse than natural images. However, medical images generally contain a large amount of noise due to the influence of imaging equipment, imaging principles, and individual differences. The preservation of image details must be taken into account while suppressing noise, which poses a great difficulty in lesion segmentation. Although some 2D CNN-based methods [12–14] have achieved impressive performance, most clinical imaging data is volumetric and these models ignore the critical 3D spatial information.
In 3D medical image segmentation tasks, 3D models [15–17] have demonstrated significant improvements over 2D models due to their ability to explore the contextual information contained in the slices, which is a great help in improving segmentation performance. However, compared to a conventional 2D CNN, the use of multi-layer 3D convolution encounters a higher computational cost due to the additional dimensionality. To solve this problem, some attempts have been made to reduce the number of network parameters that can be learned by using a lightweight network architecture [18, 19]. However, in terms of overall performance, these efficient models can not be compared with comprehensive models.
Therefore, re-visit the skip connection and attention of the U-shaped structure. UNet uses a simple skip connection to build a model with global multi-scale context information to achieve accurate segmentation of medical images, but a simple skip connection cannot effectively aggregate multi-scale features and the encoder cannot effectively mine enough information. For these reasons, it becomes a key problem to learn important local features at multiple scales, obtain semantic dependencies, and fuse the features learned by the encoder and decoder. In this study, we redesigned the structure of the skip connection and introduced a context-guided attentive conditional random field (CGA-CRF) module to connect the functions between the encoder and decoder. We introduced the feature fusion module into the skip connection to solve the class imbalance problem and improve the segmentation of complex medical images. We also introduced the global attention mechanism (GAM) [20] to further integrate the features learned by the encoder and explore the local context. The GAM module can reduce information diffusion and interact with features at the same time, which effectively solves the tumor variability problem.
The main contributions of this study are summarized as follows,
- Through simple analysis of the skip connection method, we find that the traditional simple connection method cannot realize the mutual learning between features.
- We propose a new approach by introducing a feature interaction module in the skip connection of the U-shaped network to enable information interaction and capture more accurate semantic information.
- We introduce a lightweight attention mechanism into the feature interaction structure of the U-shaped network for better feature learning and accurate segmentation of small tumors.
The structure of the rest of this article is as follows. The second section provides a review of relevant work, and the third section introduces our method in detail. The fourth part reports the data set descriptions, experimental results and performance analysis, followed by our conclusion in the fifth part.
Related work
The application of deep neural networks in brain tumor segmentation has become a research focus for computer vision tasks because of its powerful automatic feature extraction and discrimination capabilities in supervised learning. In this section, we introduce recent methods related to glioma segmentation. Based on labeled and unlabeled training samples, existing glioma segmentation methods can be classified as supervised, semi-supervised, unsupervised, and hybrid learning, and supervised learning algorithms are the dominant approach. In the past few years, various deep neural network models for computer vision tasks have been proposed, such as ResNet [21] and DenseNet [22], which provide a new way to solve the MRI brain image segmentation problems and greatly contribute to the development of deep learning-based brain tumor diagnosis. Brain tumor segmentation methods based on unsupervised learning include threshold, region, active contour model, and clustering methods such as K-means clustering [23], Bayesian fuzzy clustering [24], fuzzy C-means clustering and superpixel clustering [25]. For supervised learning, early methods include support vector machines [26] and RF [27].
Methods based on UNet
The traditional methods mentioned above require substantial manual interventions. Since 2015, UNet has adopted a symmetrical encoder architecture with skip connections, which gradually restores the down-sampled feature map to its original size, thus realizing the pixel-level intensive prediction of medical images. Later, UNet variants attracted a lot of attention and were further applied in medical image segmentation. UNet++ [28] reduces the semantic gap between encoder and decoder subnetworks by introducing a series of convolution dense connections and achieves better segmentation performance. 3D-UNet [29] replaces all 2D operations in UNet with 3D, such as 3D convolution, 3D pooling, and 3D up-sampling, which realizes better segmentation of medical image volume. RA-UNet [30] proposes a 3D hybrid residual attention perception segmentation method to precisely extract and segment tumors from the volume of interests, nnUNet [31] removes many of the excess bells and whistles from proposed network designs and focus on pre-processing and post-processing to achieve state-of-the-art performance in six recognized segmentation challenges. Probabilistic UNet [32] combines UNet with the conditional variational autoencoder (CVAE) to give UNet the ability to quantify prediction uncertainty. Partially reversible UNet [33] proposes a partially reversible UNet architecture that significantly reduces memory consumption and increases network depth to improve segmentation accuracy. 3D U2-Net [34] introduces depth-separable convolution to explore a promising general architecture. 3D dilated multi-fiber network [35] leverages the 3D multi-fiber units consisting of lightweight 3D convolutional networks to significantly reduce computational costs.
Attention mechanism
In most classical models, such as UNet, the same low-level information is extracted consecutively at the beginning, which leads to redundant use of information. Attention mechanisms can be used to segment the features of the synapse area and suppress other noise parts [36]. To enhance the semantic information of the feature map, attention-UNet [37] introduces a channel attention mechanism based on UNet network, which compresses the features generated by UNet channel-by-channel, calculates the weight of the compressed features channel-by-channel, and then multiplies the weight with the original features to get the final features. GAU-Net [38] proposes a global attention mechanism, which integrates the channel attention module and the spatial attention module to obtain good segmentation performance. 3D attention UNet [39] adopts 3D UNet architecture and combines channel and spatial attentions with a decoder network to segment. SENet [40] proposes squeeze and excitation operations; squeeze operation obtains the global description characteristics, and excitation operation captures the relationship between channels. To improve the sensitivity of the model to channel characteristics, non-local neural networks [41] use non-local operations as simple, efficient, and general components to capture long-distance dependence in deep neural networks and solve the core problems of deep neural networks.
Different skip connections
Skip connections are widely used to improve the performance and convergence of deep neural networks. The skip connection mechanism was first proposed in UNet, aiming to bridge the semantic gap between the encoder and decoder, and has proved to be effective in recovering the fine-grained details of the target objects. A fully convolutional network(FCN) also uses a skip connection; however, the difference is that the skip connection of FCN is added at the element level, while that of UNet is realized by the splicing of channels. UNet 3+ [42] uses a full-scale skip connection and depth supervision to combine high-level semantics with low-level semantics of feature maps from different scales, thus improving accuracy. With MultiResUNet [43], the feature map obtained by the encoder cannot be directly connected in series with the feature map output by the decoder, and there is a gap between them, and some convolutional layers are added to the path of the skip connection. Liu et al. [44] mainly analyzes and discusses some limitations of skip connections, and analyzes some limitations of batch normalization. A strategy of adaptively adjusting the input scale by recursive skip connections and layer normalization is proposed, which improves the performance of the skip connection.
Proposed method
In this section, starting with the lightweight hierarchical decoupled convolution(HDC) module [45], we detail a multi-modal brain tumor segmentation framework, as shown in Fig 2. This study combined the feature interaction module with the attention module and then extended HDCNet through the context-guided attentive CRF fusion module, to effectively integrate the context semantic features and the attention visual features.
HDC module
Processing 3D medical images using deep structures, especially networks with complex self-attention, is often limited by large amounts of memory and computational power. Although the number of parameters can be greatly reduced using 2D convolution, they have inherent limitations in capturing rich spatial contexts. Within limited resource constraints, designing efficient kernels with low redundancy by decomposing standard convolutions, such as depth-separable convolutions [46], group convolutions [47], and decomposition convolutions is an effective way to address this problem.
Thus, Luo et al. [45] proposed a hierarchical decoupling convolution algorithm. As shown in Fig 3, the HDC module is not calculate simultaneously in space and channel dimensions like 3D convolution, but the standard convolution is decoupled along the space and channel dimensions. Based on the above method, to reduce the computational complexity and encode cues from multiple fields of view with minimal sacrifice of spatial context awareness, we use the HDC module to decompose the 3D spatial convolution in the spatial domain into two complementary 2D convolutions to introduce the view decoupled convolution. A new hierarchical group decoupling convolution is applied to the 2D convolution on the axis view of the channel domain, that is, the parallel axis view convolution is applied to the characteristic channel groups with a hierarchical connection. The main convolution applied to the parallel branches is used to extract the multi-scale features on the focused view of the 3D volume hierarchically, while the sub-convolution after the multi-branch module mixes the multi-scale output through the main convolution and extracts the spatial context features on the complementary view.
Experimental results show that, compared with the two-dimensional method, using an HDC module instead of a 3D convolution can extract more semantic features with a small amount of memory, and the hierarchical structure can make the network better use context information, thus obtaining more stable segmentation performance.
Feature interaction module
Unbalanced categories and blurred boundaries are the difficult issues in medical brain tumor segmentation. In clinical diagnosis, experienced doctors usually determine the tumor boundary by the context information of its surrounding environment.
Projection with adaptive sampling.
Adaptive sampling projection is a sampling-based image processing technology which is used to improve image quality. The pixels are sampled pixel by pixel, and the sampling density is adjusted adaptively according to the characteristics of the local image. When there are areas with high detail and complexity in the image, the resolution of these areas can be improved by increasing the sampling density, so that the image is clearer. Adaptive sampling projection decomposes the input image into multiple sub-images and samples a set of reconstruction points in each sub-image. Then, according to the position and color value of the sampling point, calculate the gray value of the reconstructed point and output the image. We use an adaptive sampling strategy to project the original feature into the feature interaction space to generate a projected feature.
Interaction graph reasoning.
Interactive graphic reasoning is a graphical representation of multiple entities, concepts and their relationships. It can also be used for automatic reasoning and decision-making, helping people to better manage and control feature information. We put the projection feature into Interaction Graph Reasoning, defined g as graph adjacency matrix on k nodes, w as weight matrix, and the expression of graph convolution operation is as follows,
(1)
(2)
Where σ() is the activation function of sigmoid. Firstly, Laplacian smoothing is applied, and the adjacency matrix is updated to , so that the node features are distributed throughout the graph. In practice, we use 11 convolution layer to implement
and WG.
Context guided attentive CRF fusion module.
The method of CGA-CRF proposed by Liu et al. uses high-dimensional and discriminative features of context capture encoder stage in convolution space and feature interaction graph. The context-guided attention conditional random field is then used to selectively aggregate the features generated from different contexts and learn to generate the optimal features which are combined with the decoder to accurately segment tumors. To make the best use of the features learned by the encoder, we apply the CGA-CRF module to HDCNet, using the feature interaction graph to simulate and learn the relationship between lesion tissue and its surroundings, and selectively aggregate down-sampling features combined with skip connections to accurately locate brain tumors, segment tumor boundaries and improve boundary blur.
As shown in Fig 4, we follow the feature interaction diagram module in CANet, and project feature X from the encoder using the projection of adaptive sampling, thus generating XP. Then the graph context information XG is generated using the feature interaction graph to distinguish the tumor boundary. To make the network pay attention to the context information without losing the tumor information, we add a new attention module after the XC generated by convolution. The attention mechanism enhances the interaction between dimensions while preserving channel and spatial information. Given a convolution context branch feature mapping XC, the attention module derives the attention map along the two independent dimensions of channel and space in turn, then multiplies the attention map by the input feature map for adaptive feature refinement, thus obtaining XA. Experiments show that the attention mechanism can induce the network to correctly focus on the tumor targets.
Research [42] shows that simply fusing features from different sources using channel-level connection or element-level summation mechanisms simplifies the relationship between feature maps from different sources and may lead to information loss. To make full use of the generated context information XG and tumor features XA, we input XG and XA into the context-guided attention CRF fusion module with powerful reasoning ability. We can learn the hidden representation of the features encoded by the backbone of a neural network and then improve the generalization ability of the segmentation model. In addition, the potential features optimized by the conditional random field model can be learned, to realize the final feature fusion. CGA-CRF uses the context information XG and the attention visual feature XA to generate the final feature XF. To make the network retain the original low-level features, we skip Feature X from the encoder to the decoder to assist XF in generating the best segmentation map related to the MRI image.
Attention module
Visual attention mechanism is an innate ability of the human brain. Exploration of attention mechanisms aims to achieve selective attention to certain things while ignoring others in deep neural networks. In recent years, various attention mechanisms have been investigated to make models aware of the importance of different local information in images and to improve the overall performance of computer vision tasks. Convolutional block attention module (CBAM) [48] selectively designs two sub-modules of modal and spatial attention. Given an intermediate feature mapping as input, CBAM successively deduces the 1D modal attention mapping and 2D spatial attention mapping. GAM adopts the sequential channel spatial attention mechanism of CBAM and redesigns its sub-modules to improve the global attention performance of deep neural networks by reducing information diffusion and amplifying the global interactive representation.
To enhance the focus on the target tumor and retain information to amplify interactions across dimensions, we introduced the GAM, which includes 3D permutation with a multi-layer perceptron (MLP) for channel attention alongside a convolutional spatial attention sub-module. The channel attention sub-module uses latitudinal alignment to retain information in different dimensions and uses MLP to amplify cross-dimensional channel-space dependencies. The spatial attention submodule uses two convolutional layers for spatial information fusion, which makes the channel more aware of spatial information. Given the input feature mapping F1, the intermediate state F2 and output F3 are defined as follows,
(3)
(4)
The whole calculation process is shown in Fig 5, where ⊗ denotes elemental multiplication; Mc and Ms are the channel and spatial attention maps, respectively.
Network
As shown in the Fig 2, we introduce the interaction hierarchical decoupled convolution network with the classical encoder and decoder architecture. The feature interaction module consists of the CGA-CRF module and GAM module composition. The former is used to extract the context information between tumor boundaries and generate rich and consistent pixel-level features, while the latter introduces channel and spatial attention sub-module to locate the tumor and further enhance the feature representation ability.
The interaction hierarchical decoupled convolution network is a lightweight variant of 3D UNet, which has a symmetric encoder-decoder structure and a hop connection connecting the two paths. Similarly, we use the HDC module instead of 3D convolution to efficiently explore multi-scale and multi-view spatial environments. To alleviate the problem of label imbalance, we first cut the original image into 128 × 128 × 128 voxel space and use it as the input, and then use periodic down-shuffling (PDS) operation [49] before down-sampling. The purpose of PDS operation here is to rearrange a high-resolution input tensor Toriginal of size Cin into a low-resolution tensor Toutput of size C, where H is the spatial size of Toriginal and Cin is the number of channels. The space size of Toutput is half of the input space, and the output channel C′ is 8 × C. The specific operation of PDS is described as follows,
(5)
(6)
(7)
(8)
where c′, x′, y′, z′ are the coordinates of the Toutput.
A three-dimensional convolution with a convolution kernel size of 3 × 3 × 3 and a step size of 1 is used in the first stage of the encoder, Rectified Linear Unit (ReLU) with a slope of 0.01, and synchronized normalization is applied after every convolution operation. In the feature coding stage, we use the HDC module in the last three coding units to convey multi-scale information, which benefits from the unique perception ability in layered decoupling convolutions. Similarly, in the decoding stage, in the middle two down-samples, we cascade the high-resolution features of the encoder with the features of the decoder. We replace the original skip connection with a more complex feature interaction module in the last down-sampling to make the network learn more accurate details. Trilinear interpolation is used for up-sampling in the last layer of the network, and then high-resolution segmentation results are output by softmax.
Experimental results and analysis
Datasets
The Multimodal Brain Tumor Segmentation (BraTS) Challenge is a global medical image segmentation challenge co-organized by the International Association for Medical Image Computing and Computer-Assisted Intervention (MICCAI) that focuses on automated segmentation algorithms for evaluating brain tumors. We evaluate the proposed method based on clinical data from BraTS 2018 and 2019 datasets. BraTS 2018 consists of 285 training sets and 66 validation sets, and BraTS 2019 consists of 335 training sets and 125 validation sets. The ground truth for all training cases is public; The ground truth for validating use cases is reserved for online evaluation. The ground truth image segmentation consists of five labels: background, gangrene and non-enhanced tumor, edema, enhanced tumor. Although a variety of different tumor labels are provided, they can be divided into three distinct tumor subregions in medicine for evaluation: whole tumor (WT), core tumor (CT), and enhanced tumor (ET). Each case contains the four different modalities described above (T1, T1ce, T2, Flair). The provided data were pre-processed by the organizers, including co-registration of the same anatomical template, interpolation of uniform isotropic resolution (1mm3), and skull dissection. All public data can be found at: https://ipp.cbica.upenn.edu/.
Experiment details
We used PyTorch to implement the proposed method, and all experiments were carried out on two parallel Tesla T4 GPU. During training, we used the Adam algorithm to optimize the network. The batch size was 8, and the weight attenuation was 5 × 10−4. We set the initial learning rate to 1 × 10−4 decaying on a polynomial schedule. We adopted the Adam optimizer with an initial learning rate of α = 10−3. To take advantage of the spatial background information of the image, we used 3D images, which we cropped and scaled from 240 × 240 × 155 to 128 × 128 × 128. To expand the training data, we used the following data expansion techniques: (1) random mirror flip in the axial, coronal, and sagittal planes with a probability of 0.5; (2) random rotation between [−10°, 10°]; (3) random intensity shifted between [−0.1, 0.1] and the scale of between [0.9,1.1]. The L2 norm was used for model regularization with a weight decay rate of 10−5. During the testing phase, we zeroed the MRI data with the depth dimension of 240 × 240 × 155 to 240 × 240 × 160 and used it as the network input. How to solve the extremely uneven foreground and background areas in medical image segmentation is a major challenge, and it is extremely essential to select the appropriate loss function. The generalized Dice has been shown to be a good loss function to solve the imbalance of brain tumors, and its mathematical calculation formula is as follows,
(9)
where pln represents the true pixel category of category l at the nth position, while tln represents the corresponding predicted probability value wl represents the weight of each class, that is
, where N represents all voxels, l represents the number of classes, p represents predicted voxels, and t represents true voxels.
Evaluation metrics
We evaluate network performance using Dice similarity coefficient(%) and Hausdorff distance(95%) (HD95) as quantitative metrics. Dice calculation relies on the volume overlap between the predicted mask and the ground truth. Dice is sensitive to the internal padding of the mask, while HD95 is computed between boundaries of the prediction results and ground truth, which measures the segmentation accuracy of the boundary, defined as,
(10)
(11)
Where TP, FP, and FN are true positive, false positive, and false negative respectively. For HD95, P represents the predicted value, T stands for the ground truth.
Results
We validated the proposed method using the BraTS 2019 validation dataset and compared our method with the classical method. The performance comparison is shown in Tables 1 and 2. Because of the inherent characteristics of gliomas that make segmentation of ET and TC subregions more challenging compared to whole tumor segmentation, our proposed method scored 77.5%, 90.0%, and 82.7% for ET, WT, and TC, respectively. However, because our baseline model is a pseudo-3D model, there are still some gaps between our method and the best 3D methods, such as the BraTS2019 competition best method [16], but our model parameters are much less. On BraTS 2019 dataset, the Dice scores of ET, WT, and TC were 6.6%, 3%, and 5% higher than those of ResU-Net [50]; 3.8%, 0.6%, and 2% higher than those of 3D UNet [51]; and 1.4%, 1% and 4.8% higher than those of 3D FCN [52], respectively.
Thus, our algorithm is more efficient and achieved comparable segmentation accuracy with fewer parameters. We also visually compared brain tumor segmentation results from various methods, including DMFNet, and HDCNet. Fig 6 shows our method. The feature interaction monomer approach allows the model to generate better segmentations (especially at class boundaries).
From left to right: (a,b) Flair and T2 slices, (c,f) 2D ground truth overlaid on T2 slices, ET: Yellow; TC: Yellow + Red; WT: Yellow + Red + Green.
Ablation study
Quantitative evaluation: The main contributions of this study are the addition of the feature interaction module to the encoder-decoder structure skip connection upon exploring the limitations of the UNet skip connection to achieve information interaction, and the introduction of a lightweight attention mechanism in the feature interaction structure to better learn the features of the tumor and improve the segmentation accuracy. To evaluate the validity of the model, corresponding ablation experiments were performed on the components of the model on the BraTS2019 dataset. In Table 3, with other settings such as network depth, parameter size, and training strategy unchanged, we verify the performance of the proposed method on a local validation set with/without feature interaction and attention mechanisms. The results show that the performance of the segmentation model on ET, WT and TC is improved by adding modules separately. Also, combining the two modules into the model further improves the segmentation performance on all metrics. Thus, compared to the original baseline model, our approach improves the Dice scores by 1.5%, 0.5%, and 0.7%, respectively, and reduces the Hausdorff distances for the segmentation of ET, WT, and TC by 0.15, 3.05, and 0.90, respectively.
Performance is measured in Dice (%) and Hausdorff distance (mm).
Qualitative comparison: Because the ground truth labels of the BraTS validation set were not publicly available, a random selection of cases from the training set was used to form a local validation set to facilitate quantitative evaluation. The segmentation results and the 3D visualization are shown in Fig 6. Compared with baseline network and DMFNet, the results generated by our method are closer to the basic facts, especially in boundary segmentation, and our method realizes better tumor boundaries. In Fig 7, we show the segmentation results of different imaging angles, and the last column is 3 d segmentation visualization. Both the quantitative evaluation in Table 3 and the qualitative comparison in Fig 7 demonstrate the reliability and effectiveness of our proposed method. We also visualized the feature map of the proposed method. As can be seen in Fig 8, the feature maps generated by adding the attention module focus more on the target region, which facilitates segmentation. In addition, in order to further prove the advantages of this model, we made a detailed analysis of the model parameters, as shown in Table 4. Compared with the traditional 3D-UNet parameters, this method has less segmentation and higher segmentation accuracy. Although the efficiency of this model is slightly lower than that of the larger model NVDLMED [17], our parameters are much smaller.
Discussion and conclusion
Segmentation of brain tumors plays an important role in diagnosis, treatment planning and evaluation of brain tumors. In this study, a comprehensive approach was adopted to integrate the characteristics of encoder learning in order to obtain more accurate semantic information and further enhance the network’s ability to accurately locate and segment tumors.
Compared with traditional methods, our method has the following advantages: First, we use HDC module to reduce the requirement of GPU memory in training. Second, we replace the traditional skip connection structure to realize mutual learning among features. Third, the feature interaction module is introduced into U-shaped network to realize segmentation of brain tumor regions with blurred contour. Finally, this paper introduces the attention mechanism, so that the complementary features in different patterns can be learned, and the network can focus on the most useful features.
Due to the challenge of medical image segmentation, the segmentation results of our method is unstable compared with the large model, and complex environmental factors need to be considered in practical application, so a lot of experiments are needed to verify its practicability. In addition, the segmentation efficiency of the network is also important for future practice. In future work, we will make use of the characteristics of multimodal data and fuse different patterns to develop a more effective and accurate segmentation model, and expand the application of this method to verify the application of our method in various other types of segmentation task.
References
- 1. Wang R., Lei T., Cui R., Zhang B., Meng H., and Nandi A. K., “Medical image segmentation using deep learning: A survey,” IET Image Processing, vol. 16, no. 5, pp. 1243–1267, 2022.
- 2. Furnari F. B., Fenton T., Bachoo R. M., Mukasa A., Stommel J. M., Stegh A., et al., “Malignant astrocytic glioma: genetics, biology, and paths to treatment,” Genes & development, vol. 21, no. 21, pp. 2683–2710, 2007. pmid:17974913
- 3. Mahapatra D., “Analyzing training information from random forests for improved image segmentation,” IEEE Transactions on Image Processing, vol. 23, no. 4, pp. 1504–1512, 2014. pmid:24569439
- 4. Park S. H., Lee S., Yun I. D., and Lee S. U., “Hierarchical mrf of globally consistent localized classifiers for 3d medical image segmentation,” Pattern Recognition, vol. 46, no. 9, pp. 2408–2419, 2013.
- 5. Kamnitsas K., Ledig C., Newcombe V. F., Simpson J. P., Kane A. D., Menon D. K., et al, “Efficient multi-scale 3d cnn with fully connected crf for accurate brain lesion segmentation,” Medical image analysis, vol. 36, pp. 61–78, 2017. pmid:27865153
- 6. Badrinarayanan V., Kendall A., and Cipolla R., “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 12, pp. 2481–2495, 2017. pmid:28060704
- 7. Chen L.-C., Papandreou G., Kokkinos I., Murphy K., and Yuille A. L., “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834–848, 2017. pmid:28463186
- 8. Sinha A. and Dolz J., “Multi-scale self-guided attention for medical image segmentation,” IEEE journal of biomedical and health informatics, vol. 25, no. 1, pp. 121–130, 2020.
- 9. Liu Z., Tong L., Chen L., Zhou F., Jiang Z., Zhang Q., et al, “Canet: Context aware network for brain glioma segmentation,” IEEE Transactions on Medical Imaging, vol. 40, no. 7, pp. 1763–1777, 2021.
- 10. Zhang Weidong, et al. “SSTNet: Spatial, Spectral, and Texture Aware Attention Network Using Hyperspectral Image for Corn Variety Identification”. IEEE Geoscience and Remote Sensing Letters 2022, 19, 1–5.
- 11.
O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241.
- 12. Gu Z., Cheng J., Fu H., Zhou K., Hao H., Zhao Y., et al, “Ce-net: Context encoder network for 2d medical image segmentation,” IEEE transactions on medical imaging, vol. 38, no. 10, pp. 2281–2292, 2019.
- 13.
D. Jha, M. A. Riegler, D. Johansen, P. Halvorsen, and H. D. Johansen, “Doubleu-net: A deep convolutional neural network for medical image segmentation,” in 2020 IEEE 33rd International symposium on computer-based medical systems (CBMS). IEEE, 2020, pp. 558–564.
- 14. Zhang Z., Liu Q., and Wang Y., “Road extraction by deep residual u-net,” IEEE Geoscience and Remote Sensing Letters, vol. 15, no. 5, pp. 749–753, 2018.
- 15. Zhang J., Xie Y., Wang Y., and Xia Y., “Inter-slice context residual learning for 3d medical image segmentation,” IEEE Transactions on Medical Imaging, vol. 40, no. 2, pp. 661–672, 2020.
- 16.
Z. Jiang, C. Ding, M. Liu, and D. Tao, “Two-stage cascaded u-net: 1st place solution to brats challenge 2019 segmentation task,” in International MICCAI brainlesion workshop. Springer, 2019, pp. 231–241.
- 17.
Myronenko A., “3d mri brain tumor segmentation using autoencoder regularization,” in International MICCAI Brainlesion Workshop. Springer, pp. 311–320, 2018.
- 18. Zhang J., Xie Y., Zhang P., Chen H., Xia Y., and Shen C., “Light-weight hybrid convolutional network for liver tumor segmentation,” in IJCAI, vol. 19, 2019, pp. 4271–4277.
- 19. Qin D., Bu J.-J., Liu Z., Shen X., Zhou S., Gu J.-J., et al, “Efficient medical image segmentation based on knowledge distillation,” IEEE Transactions on Medical Imaging, vol. 40, no. 12, pp. 3820–3831, 2021. pmid:34283713
- 20.
Y. Liu, Z. Shao, and N. Hoffmann, “Global attention mechanism: Retain information to enhance channel-spatial interactions,” arXiv preprint arXiv:2112.05561, 2021.
- 21.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- 22.
G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700–4708.
- 23. Arunkumar N., Mohammed M. A., Abd Ghani M. K., Ibrahim D. A., Abdulhay E., Ramirez-Gonzalez G., et al, “K-means clustering and neural network for object detecting and identifying abnormality of brain tumor,” Soft Computing, vol. 23, no. 19, pp. 9083–9096, 2019.
- 24. Raja P. S. et al., “Brain tumor classification using a hybrid deep autoencoder with bayesian fuzzy clustering-based segmentation approach,” Biocybernetics and Biomedical Engineering, vol. 40, no. 1, pp. 440–453, 2020.
- 25. Khosravanian A., Rahmanimanesh M., Keshavarzi P., and Mozaffari S., “Fast level set method for glioma brain tumor segmentation based on superpixel fuzzy clustering and lattice boltzmann method,” Computer Methods and Programs in Biomedicine, vol. 198, p. 105809, 2021. pmid:33130495
- 26.
M. Gurbină, M. Lascu, and D. Lascu, “Tumor detection and classification of mri brain image using different wavelet transforms and support vector machines,” in 2019 42nd International Conference on Telecommunications and Signal Processing (TSP). IEEE, 2019, pp. 505–508.
- 27. Yang T., Song J., and Li L., “A deep learning model integrating sk-tpcnn and random forests for brain tumor segmentation in mri,” Biocybernetics and Biomedical Engineering, vol. 39, no. 3, pp. 613–623, 2019.
- 28.
Z. Zhou, M. M. Rahman Siddiquee, N. Tajbakhsh, and J. Liang, “Unet++: A nested u-net architecture for medical image segmentation,” in Deep learning in medical image analysis and multimodal learning for clinical decision support. Springer, 2018, pp. 3–11.
- 29.
Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, “3d u-net: learning dense volumetric segmentation from sparse annotation,” in International conference on medical image computing and computer-assisted intervention. Springer, 2016, pp. 424–432.
- 30. Jin Q., Meng Z., Sun C., Cui H., and Su R., “Ra-unet: A hybrid deep attention-aware network to extract liver and tumor in ct scans,” Frontiers in Bioengineering and Biotechnology, p. 1471, 2020. pmid:33425871
- 31. Isensee F., Jaeger P. F., Kohl S. A., Petersen J., and Maier-Hein K. H., “nnu-net: a self-configuring method for deep learning-based biomedical image segmentation,” Nature methods, vol. 18, no. 2, pp. 203–211, 2021. pmid:33288961
- 32.
S. Kohl, B. Romera-Paredes, C. Meyer, J. De Fauw, J. R. Ledsam, K. Maier-Hein, et al, “A probabilistic u-net for segmentation of ambiguous images,” Advances in neural information processing systems, vol. 31, 2018.
- 33.
R. Brügger, C. F. Baumgartner, and E. Konukoglu, “A partially reversible u-net for memory-efficient volumetric image segmentation,” in International conference on medical image computing and computer-assisted intervention. Springer, 2019, pp. 429–437.
- 34.
C. Huang, H. Han, Q. Yao, S. Zhu, and S. K. Zhou, “3d u2-net: A 3d universal u-net for multi-domain medical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2019, pp. 291–299.
- 35.
C. Chen, X. Liu, M. Ding, J. Zheng, and J. Li, “3d dilated multi-fiber network for real-time brain tumor segmentation in mri,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2019, pp. 184–192.
- 36. Gu R., Wang G., Song T., Huang R., Aertsen M., Deprest J., et al, “Ca-net: Comprehensive attention convolutional neural networks for explainable medical image segmentation,” IEEE transactions on medical imaging, vol. 40, no. 2, pp. 699–711, 2020.
- 37.
O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa, et al., “Attention u-net: Learning where to look for the pancreas,” arXiv preprint arXiv:1804.03999, 2018.
- 38.
X. Gan, L. Wang, Q. Chen, Y. Ge, and S. Duan, “Gau-net: U-net based on global attention mechanism for brain tumor segmentation,” in Journal of Physics: Conference Series, vol. 1861, no. 1. IOP Publishing, 2021, p. 012041.
- 39.
M. Islam, V. Vibashan, V. Jose, N. Wijethilake, U. Utkarsh, and H. Ren, “Brain tumor segmentation and survival prediction using 3d attention unet,” in International MICCAI Brainlesion Workshop. Springer, 2019, pp. 262–272.
- 40.
J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132–7141.
- 41.
X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7794–7803.
- 42.
H. Huang, L. Lin, R. Tong, H. Hu, Q. Zhang, Y. Iwamoto, et al, “Unet 3+: A full-scale connected unet for medical image segmentation,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 1055–1059.
- 43. Ibtehaz N. and Rahman M. S., “Multiresunet: Rethinking the u-net architecture for multimodal biomedical image segmentation,” Neural Networks, vol. 121, pp. 74–87, 2020. pmid:31536901
- 44.
F. Liu, X. Ren, Z. Zhang, X. Sun, and Y. Zou, “Rethinking skip connection with layer normalization in transformers and resnets,” arXiv preprint arXiv:2105.07205, 2021.
- 45. Luo Z., Jia Z., Yuan Z., and Peng J., “Hdc-net: Hierarchical decoupled convolution network for brain tumor segmentation,” IEEE Journal of Biomedical and Health Informatics, vol. 25, no. 3, pp. 737–745, 2020.
- 46.
F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1251–1258.
- 47.
S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1492–1500.
- 48.
S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3–19.
- 49. Zeng G. and Zheng G., “Holistic decomposition convolution for effective semantic segmentation of medical volume images,” Medical image analysis, vol. 57, pp. 149–164, 2019. pmid:31302511
- 50. Zhang J., Jiang Z., Dong J., Hou Y., and Liu B., “Attention gate resu-net for automatic mri brain tumor segmentation,” IEEE Access, vol. 8, pp. 58 533–58 545, 2020.
- 51.
F. Wang, R. Jiang, L. Zheng, C. Meng, and B. Biswal, “3d u-net based brain tumor segmentation and survival days prediction,” in International MICCAI Brainlesion Workshop. Springer, 2019, pp. 131–141.
- 52. Sun J., Peng Y., Guo Y., and Li D., “Segmentation of the multimodal brain tumor image used the multi-pathway architecture method based on 3d fcn,” Neurocomputing, vol. 423, pp. 34–45, 2021.
- 53.
N. Nuechterlein and S. Mehta, “3d-espnet with pyramidal refinement for volumetric brain tumor image segmentation,” in International MICCAI Brainlesion Workshop. Springer, 2018, pp. 245–253.
- 54.
P.-Y. Kao, T. Ngo, A. Zhang, J. W. Chen, and B. Manjunath, “Brain tumor segmentation and tractographic feature extraction from structural mr images for overall survival prediction,” in International MICCAI Brainlesion Workshop. Springer, 2018, pp. 128–141.
- 55. Sun Y. and Wang C., “A computation-efficient cnn system for high-quality brain tumor segmentation,” Biomedical Signal Processing and Control, vol. 74, p. 103475, 2022.
- 56.
J. Tong and C. Wang, “A performance-consistent and computation-efficient cnn system for high-quality automated brain tumor segmentation,” arXiv preprint arXiv:2205.01239, 2022.
- 57.
S. Chandra, M. Vakalopoulou, L. Fidon, E. Battistella, T. Estienne, R. Sun, et al, “Context aware 3d cnns for brain tumor segmentation,” in International MICCAI Brainlesion Workshop. Springer, 2018, pp. 299–310.
- 58. Rehman M. U., Cho S., Kim J., and Chong K. T., “Brainseg-net: Brain tumor mr image segmentation via enhanced encoder–decoder network,” Diagnostics, vol. 11, no. 2, p. 169, 2021. pmid:33504047
- 59.
F. Isensee, P. Kickingereder, W. Wick, M. Bendszus, and K. H. Maier-Hein, “No new-net,” in International MICCAI Brainlesion Workshop. Springer, 2018, pp. 234–244.
- 60.
S. Puch, I. Sánchez, A. Hernández, G. Piella, and V. Prckovska, “Global planar convolutions for improved context aggregation in brain tumor segmentation,” in International MICCAI Brainlesion Workshop. Springer, 2018, pp. 393–405.
- 61.
E. Carver, C. Liu, W. Zong, Z. Dai, J. M. Snyder, J. Lee, et al, “Automatic brain tumor segmentation and overall survival prediction using machine learning algorithms,” in International MICCAI Brainlesion Workshop. Springer, 2018, pp. 406–418.
- 62. Di Ieva A., Russo C., Liu S., Jian A., Bai M. Y., Qian Y., et al, “Application of deep learning for automatic segmentation of brain tumors on magnetic resonance imaging: a heuristic approach in the clinical scenario,” Neuroradiology, vol. 63, no. 8, pp. 1253–1262, 2021. pmid:33501512
- 63.
Y.-X. Zhao, Y.-M. Zhang, and C.-L. Liu, “Bag of tricks for 3d mri brain tumor segmentation,” in International MICCAI Brainlesion Workshop. Springer, 2019, pp. 210–220.
- 64.
X. Li, G. Luo, and K. Wang, “Multi-step cascaded networks for brain tumor segmentation,” in International MICCAI Brainlesion Workshop. Springer, 2019, pp. 163–173.
- 65.
A. Myronenko and A. Hatamizadeh, “Robust semantic segmentation of brain tumor regions from 3d mris,” in International MICCAI Brainlesion Workshop. Springer, 2019, pp. 82–89.
- 66. Ali M. J., Raza B., and Shahid A. R., “Multi-level kronecker convolutional neural network (ml-kcnn) for glioma segmentation from multi-modal mri volumetric data,” Journal of Digital Imaging, vol. 34, no. 4, pp. 905–921, 2021. pmid:34327627