Figures
Abstract
A fundamental computer vision task called semantic segmentation has significant uses in the understanding of medical pictures, including the segmentation of tumors in the brain. The G-Shaped Net architecture appears in this context as an innovative and promising design that combines components from many models to attain improved accuracy and efficiency. In order to improve efficiency, the G-Shaped Net architecture synergistically incorporates four fundamental components: the Self-Attention, Squeeze Excitation, Fusion, and Spatial Pyramid Pooling block structures. These factors work together to improve the precision and effectiveness of brain tumor segmentation. Self-Attention, a crucial component of G-Shaped architecture, gives the model the ability to concentrate on the image’s most informative areas, enabling accurate localization of tumor boundaries. By adjusting channel-wise feature maps, Squeeze Excitation completes this by improving the model’s capacity to capture fine-grained information in the medical pictures. Since the G-Shaped model’s Spatial Pyramid Pooling component provides multi-scale contextual information, the model is capable of handling tumors of various sizes and complexity levels. Additionally, the Fusion block architectures combine characteristics from many sources, enabling a thorough comprehension of the image and improving the segmentation outcomes. The G-Shaped Net architecture is an asset for medical imaging and diagnostics and represents a substantial development in semantic segmentation, which is needed more and more for accurate brain tumor segmentation.
Citation: D. S. CS, Clement J. C (2024) G-Net: Implementing an enhanced brain tumor segmentation framework using semantic segmentation design. PLoS ONE 19(8): e0308236. https://doi.org/10.1371/journal.pone.0308236
Editor: Peng Geng, Shijiazhuang Tiedao University, CHINA
Received: February 9, 2024; Accepted: July 18, 2024; Published: August 6, 2024
Copyright: © 2024 D. S., Clement J. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data underlying the results presented in the study are available from https://www.kaggle.com/datasets/awsaf49/brats20-dataset-training-validation.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
1 Introduction
Brain Tumor Segmentation (BTS) is a medical computational image analysis technique pertaining to identifying brain tumors from healthy brain tissues in magnetic resonance imaging (MRI) data. The ultimate goal of focusing on BTS is to develop a multidimensional segmentation model that properly portrays the tumor’s precise position and size. Glial cells in the brain and spinal cord are the source of the common malignant brain tumor known as glioma. The median survival time for glioma patients is about 12 months, and gliomas are aggressive cancers. The need of early tumor detection makes MRI a key tool in this process. The numerous sequences of MRI, such as T1-weighted, T2-weighted, T1-weighted contrast-enhanced, and T2 Fluid Attenuated Inversion Recovery, which emphasize various tumor characteristics, give high spatial resolution anatomical information. For the purpose of tumor diagnosis, the annotation and segmentation of tumor borders must be accurate. Manual segmentation, on the other hand, is expensive, time-consuming, and prone to human mistake, particularly when tumors have diverse intensities and forms in various sub-regions. The identification of brain anomalies on MR imaging involves various kinds of steps, including image preprocessing, feature extraction, characteristics, image enhancement and visualization, segmentation, and classification [1]. M. Ghaffari, A. Sowmya and R. Oliver review the evolution of automated models for BTS using multimodal MR images, highlighting the challenge of developing such methods due to the heterogeneity of brain tumors [2]. Montaha, Sidratul, et al. presented a method for automated BTS from 3D MRI scans using an optimized U-net model. After normalization and rescaling, the model achieves an accuracy of 99.41% and a dice similarity coefficient of 93% [3]. Sangui, Smarta, et al. developed a modified U-Net architecture for detecting and segmenting brain tumors from MRI images using a deep-learning framework. The model attained a test accuracy of 99.4% using BRATS 2020 datasets. The U-Net model outperforms other DL-based models, making it a valuable tool for detecting and analyzing brain tumors compared to other methods [4]. A modified U-Net structure using residual networks and sub-pixel convolution was proposed, enhancing modelling capability and avoiding de-convolution overlapping. The model was evaluated on the Brain Tumor Segmentation (BraTS) Challenge 2017–2018, achieving segmentation accuracy of 93.40% and 92.20%, respectively [5].
Abdullah Al Nasim, M. D., et al. used U-Net to segment brain tumors from MRI images, focusing on necrotic, edematous, growing, and healthy tissue. The 2D U-Net network was trained using the BraTS datasets, reducing computational time by excluding background details. Experiments show the proposed model works well, with dice scores of 0.8717 for necrotic, 0.9506 for edema, and 0.9427 for enhancing [6]. Throughout the world, brain tumors are a common type of tumor. Due to poor and lagging diagnostic techniques and algorithms, this disease is increasingly the main cause of death nowadays. The demands of larger datasets are less effectively satisfied by the brain tumor diagnosis techniques now in use. Larger datasets take longer to process, which slows down system performance. The accuracy, specificity, and sensitivity of the models are evaluated in order to determine how well they work. The G-architecture is designed and developed as a key solution for addressing these significant challenges. The models’ accuracy, specificity, and sensitivity are used to evaluate their performance in the workflow technique.
2 Related works
The research focuses on improving autonomous BTS and classification techniques for better diagnosis. Surveys have been conducted to explore techniques used in this medical image analysis field. This survey provides a comprehensive overview of proposed techniques like segmenting, suppressing irrelevant regions, salient feature extraction, machine learning, and deep learning. It also covers the technical aspects, strengths, weaknesses, and performance matrix of the proposed architecture model. The aim is to enhance the diagnostic capabilities of physicians and healthcare professionals.
Roy, Sunita, et al. proposed two new CNN-based models, S-Net and SA-Net, for image segmentation in medical imaging, particularly for brain tumors in MRI scans. These models use U-Net as the base architecture and leverage Merge Block and Attention Block concepts [7]. The U-shaped architecture, with Block-R3 applied, outperformed Block-R1 and Block-R2. The proposed AD unit extracted detailed tumor features efficiently, achieving dice scores of 0.90, 0.80, and 0.76 on the BraTS20 dataset [8]. Modeling channel dependencies, and utilizing multi-scale predictive fusion, resulting in superior segmentation performance compared to existing networks [9]. Ali, Tahir Mohammad, et al. described and developed an attention-based convolutional neural network for BTS using the BRATS’20 dataset. The results show a dice similarity coefficient of 0.83, 0.86, and 0.90 for enhancing, core, and whole tumors [10].
Bindu, N. Phani, and P. Narahari Sastry used ResNet50 as an encoder in the U-Net model to enhance segmentation precision and efficacy in medical imaging applications. This approach introduces cross or skip connections between network blocks, reducing the need for frequent skip connections. Test IoU and Test Dice Coeff values of 0.902 and 0.948 were obtained, respectively [11]. A hybrid of the deep residual network and the U-Net model, using the residual network as an encoder and the U-Net model as a decoder to address vanishing gradient issues. Validation on an external cohort showed the model’s robustness in real-world clinical settings, with dice scores of 0.8400, 0.8601, and 0.8221 for TC, WT, and ET, respectively [12]. Vijay, Sanchit, et al. proposed a model called SPP-U-Net, which uses a combination of Spatial Pyramid Pooling (SPP) and Attention blocks to improve performance. The model’s average Dice Score and Haussdorf distance are 0.883 and 7.99, respectively [13]. Baid, U. et al. presented a method for glioma tumor segmentation and survival prediction using a Deep Learning Radiomics Algorithm for Gliomas (DRAG) Model and a 3D patch-based U-Net model. The model achieved good performance in the BraTS-2018, with Dice scores of 0.88, 0.83, and 0.75 for WT, TC, and ET, respectively [14]. Isensee, Fabian, et al. applied an nnU-Net for the BraTS 2020 challenge segmentation task, achieving respectable results. By incorporating BraTS-specific modifications, including postprocessing, region-based training, and data augmentation, the performance is significantly improved. The method achieved and obtained performance with Dice scores of 88.95, 85.06, and 82.03 for the WT, TC, and ET [15].
Biratu, Siyoum, et al. investigated BTS and classification using region-growing, shallow machine learning, and deep learning methods. They discussed usage, pre-processing, feature extraction, segmentation, classification, post-processing, model performance, pros and cons, and model evaluation metrics [16]. For automated BTS, Aboelenein, Nagwa M., et al. suggested a novel MIRAU-Net model incorporating residual inception modules, attention gates, sub-networks encoders and decoders, including a multi-loss function aimed at reducing class imbalance [17]. Yousef, Rammah et al. explored four U-Net architectures (3D, Attention, R2 Attention, modified 3D U-Net) on the BraTS 2020 dataset for BTS, evaluating their performance in terms of Dice score and Hausdorff distance of 95% and emphasizing their significance of visualizations [18].
Major trends for future research strategies are emphasized in the discussion of recent U-Net architecture-based BTS approaches [19]. Yang, Tiejun, and Jikun Song proposed fully automatic brain tumor MRI image segmentation algorithm utilizes a semantic segmentation U-net model, integrating image features from image patch datasets and adding a 1x1 convolutional layer [20]. Prasanna, Gaurav et al. proved a double attention-based scheme for incorporating a squeeze and excitation network and a soft attention mechanism. The model can be tested on a 3D medical imaging dataset to enhance performance and achieve a higher Jaccard coefficient [21].
Automatic BTS for MRI images utilizes encoder-decoder-based convolutional neural networks, particularly UNET and SEGNET [22]. Hmeed, Assef Raad, et al. used the U-net model and a fully convolutional network technique for semantic segmentation at BraTS 2018, achieving mean dice similarity coefficients of 0.87, 0.76, and 0.71 [23]. Agrawal, Pranjal, et al. introduced a 3D U-Net model for volumetric BTS, utilizing a CNN-based automated system for segmentation and feature extraction and a classical neural network for classification [24]. The BRATS benchmark evaluated current methods. Achieving Dice scores of over 80% for the whole tumor, but the active core region is more challenging [25]. Inspired by MobileNetV2 and U-Net, Saeed, Muhammad Usman, et al. developed an effective DL-based RMU-Net model for BTS. It outperforms models with fewer parameters and high dice coefficient scores [26].
DAU-Net is a deep supervised nested segmentation network that employs a modified dense skip connection for feature detection and merging, developed by Na Li and Kai Ren [27]. The SCAU-Net is a 3D U-Net model for BTS, enhancing semantic up-sampling through external attention and self-calibrated convolution, which demonstrated impressive performance on the BraTS 2020 validation dataset with dice similarity coefficients of 0.905, 0.821, and 0.781 [28]. Huang, He, et al. introduced a deep framework for BTS that implements a V-Net-based distance transform decoder for better accuracy and feature extraction. On the 2020 BraTS dataset, the model demonstrated remarkable performance, with Dice metrics of 0.75, 0.86, and 0.77 for the ET, WT, and TC regions, respectively [29].
The work introduces NLCA-VNet, a glioblastoma automated segmentation approach that uses VNet and nonlocal and convolutional block attention modules to improve segmentation performance by retaining more information and performing attention in channel and spatial dimensions [30]. The study introduces AGSE-VNet, an automated brain tumour MRI data segmentation that employs the Squeeze and Excite modules, as well as the Attention Guide Filter, to improve usable information and reduce noise. It confirms Dice scores of WT, TC and ET 0.68, 0.85, and 0.70, respectively [31]. The study describes TransBTS, a network built on an encoder-decoder framework that employs Transformer in 3D CNN for MRI BTS [32].
By extracting information from the entire image, the study suggests modifying U-Net and implementing an attention block (AttU-Net), which raises the Dice score by 5%. The technique was evaluated on the BraTS 2021 challenge dataset and yielded promising ET, TC, and WT scores of 0.793, 0.819, and 0.879, respectively [33]. In this study, a modified Bridged U-Net architecture with Atrous Spatial Pyramid Pooling (ASPP) and an evolving normalisation layer was proposed. The model accomplished both the BraTS 2020 and BraTS 2021 challenge datasets, outperforming other cutting-edge models [34].
Researchers discussed the use of transfer learning for brain tumor classification, highlighting its potential for accuracy improvement. It emphasized the need for large annotated datasets and computational resources. [35] suggested leveraging transfer learning to reduce training time and enhance model performance. The model’s effectiveness was evaluated, revealing its potential for clinical use and enhanced diagnostic accuracy. Techniques were used, including evolutionary algorithms such as genetic algorithms or nature-inspired optimization techniques, and DL techniques for feature extraction and classification. [36] showed promising results in accurately grading and classifying brain tumors, reducing diagnostic errors and improving patient outcomes. The hybrid approach improved quality and robustness, enhancing clinical practice precision in BTS. [37] combined handcrafted features with global pathway-based DL. Handcrafted CNNs are introduced in [38] to improve accuracy and efficiency in BTS from MRI images. Customized to the specific characteristics of the data, this approach shows significant gains above general CNN models. Scalable federated learning is used in [39] to increase accuracy while preserving data privacy and confidentiality, proving the usefulness of these methods in medical image analysis.
CNNs have made tremendous progress recently in improving medical image processing, especially in segmenting and classifying cancer images. [40], when it comes to recognizing cancer cells from breast cytopathology images, CNNs perform better than conventional techniques. Examiner and Mean Teacher models are combined in a new 3D CNN-based semi-supervised learning framework for brain tumor segmentation, which improves segmentation performance by using both labeled and unlabeled data [41]. This research shows how semi-supervised learning methods and CNNs can be used to increase the precision and effectiveness of cancer image analysis.
On these literatures, authors have proposed an automated system for BTS techniques, an architectural design, reproducible segmentation performance similar to manual results. This architecture can alleviate difficulties in manually analyzing brain tumors, speed up image analysis, improve diagnosis outcomes, and facilitate disease follow-up by evaluating tumor progression. In this section, among the proposed BTS techniques in the scientific literature, predictive modeling, ML, neural networks, and DL-based approaches will be reviewed for identifying the clinical dataset, pre-processing, feature extraction, segmentation algorithm, and observed overall outcomes. In order to ensure proper data format, segmentation properties, and accurate labeling for efficient training of deep learning models, this work investigates data preprocessing techniques for medical images with a focus on segmentation tasks.
3 Methodology
In the methodology section of this research paper, we describe in detail the key building blocks employed in our deep CNN architecture designed for image segmentation tasks. These building blocks, namely convolutional layers, Self-Attention(SA), Squeeze Excitation(SE), Spatial Pyramid Pooling(SPP), and fusion blocks serve to enhance the network’s feature extraction capabilities, attention mechanisms, and multi-scale feature processing, ultimately contributing to improved segmentation performance. Proposed model by integrating convolutional layers, SE, SPP, fusion blocks and SA within the G-Net (G-Shaped Net) architecture as shown in Fig 1. Detailed explanation of each block are given below.
3.1 Squeeze excitation block
The Squeeze Excitation (SE) block is a crucial component of the G Net architecture designed to enhance channel-wise feature dependencies in the input tensor. This block begins by applying global average pooling, which computes the average value for each channel across the entire spatial domain of the feature maps. This effectively reduces the spatial dimensions while retaining the channel-wise information. The output from global average pooling is then fed through two fully connected layers with ReLU and sigmoid activations, respectively. These layers adjust the channel-wise weights, allowing the network to emphasize or de-emphasize certain channels based on their importance. Finally, the reshaping operation transforms the result into a tensor with dimensions (1, 1, channels). This tensor is element-wise multiplied with the original input tensor, effectively scaling each channel differently based on learned channel-wise weights. This process helps the network focus on the most relevant features, improving the model’s ability to capture important information for semantic segmentation. Squeeze and Excitation are the two steps that the SE module in our model uses to accomplish feature recalibration. Global Average Pooling is used to aggregate global spatial information during the Squeeze step. This data is utilized in the Excitation stage to create channel-wise dependencies between two fully linked (Dense) layers that have sigmoid and ReLU activations, respectively. The SE module when a input image is given and passed through it and what is the results it reflects after passing through a single SE block is shown in Fig 2.
Mathematical representation of SE block given below:
Given input tensor X with dimensions (H, W, C), and considering a single channel:
Squeeze: Calculate channel importance for each channel c in the input tensor using global average pooling:
H represents the height of the input tensor. W represents the width of the input tensor. X(i, j, c) is the value at spatial coordinates (i, j) in channel c of the input tensor. Ze and Zs are matrices representing the channel importance values.
Excitation: Refine channel importance using two fully connected (Dense) layers. The first layer introduces non-linearity with ReLU activation, and the second layer squashes values between 0 and 1 with sigmoid activation:
W1 and W2 are learnable weights. ReLU represents the rectified linear unit activation function. σ represents the sigmoid activation function.
Reshape the refined importance values to have the same dimensions as the input tensor (1x1xC). Scale the original input tensor by element-wise multiplication with the importance values:
The result is an output tensor Y where each channel’s information is adjusted based on its importance, emphasizing important features and de-emphasizing less important ones. This helps the neural network focus on relevant information in the input data.
3.2 Self-Attention block
The Self-Attention (SA) Block introduces an attention mechanism into G-Net, allowing the model to focus on and emphasize specific regions within the input tensor that are deemed most relevant for the task at hand. This block begins by calculating attention weights through a 1x1 convolutional layer with sigmoid activation. These weights are derived from the input tensor and represent the importance of each spatial position within the feature maps. The attention weights are then element-wise multiplied with the original input tensor, resulting in an attention-weighted feature map. The model’s SA mechanism, which softens focus and adapts to input relevance, is particularly useful for tasks like semantic segmentation, ensuring precise localization and feature importance.
The SA Block involves the computation of attention weights using a 2D convolution with a sigmoid activation function: Where, A represents the computed attention weights. The (1, 1) kernel size is used for channel-wise attention. Then, input feature map X is rescaled using the computed attention weights A, resulting in the enhanced feature map Y:
3.2 Spatial Pyramid Pooling block
The Spatial Pyramid Pooling block is a critical feature extraction component within G-Net, designed to capture multi-scale information from the input tensor. This block starts with the original input tensor and constructs a pyramid of features at various spatial resolutions. It does this by repeatedly applying max-pooling, convolution, and up sampling operations with different pooling sizes specified in the pool sizes parameter. Each pooling size captures information at a different scale. After pooling, a 1x1 convolutional layer is used to transform the pooled features, and then they are up sampled to the original size. These feature maps at various scales are then concatenated along the channel axis, creating a rich representation of the input data at different levels of detail. This multi-scale information is invaluable for semantic segmentation tasks, as it helps the model make contextually informed decisions about object boundaries and features. Mathematical representation of SPP block given below:
Given input tensor X with dimensions (H, W, C), and considering a single pooling level. Perform max pooling with a pooling window size of p × p:
Apply a 1x1 convolution to reduce the number of channels to 256:
Upsample the feature map to the original size:
Concatenate the upsampled feature maps along the channel axis to form the final SPP output: Where, H and W represents the height and width of the input tensor. C represents the number of channels. p represents the pool size (a positive integer). i and j represent the spatial coordinates within the tensor. c and f represents the channel and filter index respectively. Wp represents the weights for the 1x1 convolution.
3.4 Fusion block
The Fusion Block plays a pivotal role in combining feature maps from multiple sources within the G-Net architecture. It takes a list of input tensors and concatenates them along the channel axis to create a fused input tensor. This fused input tensor is then passed through a convolutional layer with a specified number of filters, kernel size, activation function, padding, and kernel initializer. This block integrates network information and feature extraction methods, enhancing semantic segmentation and accuracy by capturing complex patterns and relationships, especially in scenarios requiring multiple levels of abstraction.
The Fusion Block starts with the concatenation of input tensors X1, X2, …, Xn along the channel axis to create a fused input feature map F:
Then, a 2D convolution is applied to the fused input feature map F, resulting in the output feature map Y. The convolution can be represented as: Where, F is the fused input feature map. filters is the number of filters in the convolution. kernel_size denotes the size of the convolutional kernel. activation specifies the activation function applied to the output. padding is the padding method. kernel_initializer is the method for initializing the kernel weights.
Tables 1 and 2 provide the feature extraction procedure, the integration of distinct blocks, and the description and reasoning for each block. Integration and Feature Extraction process involves processing the input tensor through SE, multi-scale feature aggregation through the SPP block, feature fusion using the Fusion Block, feature extraction through the Convolutional Block, and attention mechanism through the SA Block. The tensor is then processed to adjust channel-wise features, capture contextual information at various scales, and refine features through two convolutional layers. The output is then processed to refine the feature map based on computed attention weights.
3.5 Impact of our framework
Breakdown of each module’s’s operation to understand the functionality of the architecture. The model starts by applying ReLU activation and two convolutional layers with 64 filters to the input image through an initial convolutional block. Next, a SE module with global average pooling, dense layers with ReLU and sigmoid activations, and a reshaping step that multiplies the SE output by the input tensor to enhance significant features are applied to this output. Max-pooling is the procedure that comes after this to minimize spatial dimensions. The second and third convolutional blocks which use 128 and 256 filters, respectively are used in the same way. SE modules come after each block. Following the third block, a custom SPP layer pools features at several scales and concatenates them to generate a rich representation. SA mechanism then captures long-range dependencies. Upsampling the SPP layer’s output and utilizing fusion blocks to combine it with matching features from previous convolutional blocks constitute the decoder phase. Every fusion block applies a convolutional layer to combine the inputs after concatenating them. In order to ensure correct segmentation, the model employs a 1x1 convolutional layer with 4 filters and softmax activation to generate a segmentation mask. This allows for the detailed capture of features and their efficient combination. This architecture effectively combines SE, SPP, Fusion, SA block, convolutional layers to capture both local and global contextual information, enhance feature dependencies, and maintain fine spatial details, contributing to its proficiency in semantic segmentation.
4 Results and discussion
This section presents comprehensive experimental results demonstrating G-Net’s performance compared to other state-of-the-art models and the traditional U-Net. In order to build and train models for a variety of machine learning tasks, the system makes use of a PowerEdge R740 server equipped with an INTEL XEON Silver 4208 processor, Tesla V100 GPU, 128 GB DDR4 RAM, TensorFlow deep learning framework, and Keras API.
4.1 Dataset description
In our investigation using a typical Kaggle dataset, the data distribution for partitioning was as follows: Training: 425 images, Validation: 125 images, Testing: 75 images. Details of the data description are displayed in Table 3. Our initial approach involved partitioning the dataset at the slice level to maximize data utilization and maintain a balanced dataset. This method facilitated a comprehensive training phase, accounting for variations in tumor appearance across different slices. We ensured that all slices were exclusively assigned to either the training or testing set. Importantly, the consistent performance across both partitioning strategies reaffirms the generalizability and robustness of G-Net. Notably, the training and testing sets do not overlap with slices from the same patient. This design choice ensures that improvements in performance, particularly in dice scores for tumor regions, are not influenced by shared characteristics among slices from the set. Thus, our partitioning technique is appropriate and does not compromise the validity of our performance assessment.
4.2 Model and parameters training approach
The G-Net model is trained with classified training data and a categorical cross-entropy loss function. Training specifics, like optimizer selection, learning rate, and information, are frequently explained in this section. The model proposed in this work is independently trained and validated four times using distinct MRI sequences. The BRATS 2020 datasets are available to the public, and they are used to assess the proposed framework. A patient receives 155 slices, each measuring 240 x 240 x 1 x 155, in an MRI scan. The BRATS dataset contained comparable T1, T1c, T2, and FLAIR MRI sequences from patients who had either high-grade or low-grade gliomas associated data sample shown in Fig 3. Each case was individually categorized into subgroups for peritumoral edema, enhancing tumor, and non-enhancing tumor core using the same labeling procedure.
4.3 Evaluation metrics
The following are then used to calculate the model’s accuracy, precision, and Eqs 1 and 2 are used for sensitivity and specificity based on the segmented ground truth of the tumor portion as provided by the MRI.
Sensitivity is calculated using Eq (1), where the cardinalities of sets X and Y are denoted as |X| and |Y| respectively. G1 represents the proportion of tumor regions in the ground truth images, and X1 represents tumor regions that were predicted by the model. (1)
Specificity is calculated using Eq (2), where G0 represents non-tumor tissue regions of the ground truth, and X0 represents the non-tumor tissue regions predicted by the model. (2) (3)
Tversky Loss calculated with Eq (3) where, TP is the sum of true positives, FN is the sum of false negatives, FP is the sum of false positives. α and β are weights that control the penalties for false negatives and false positives, respectively, smooth is a small constant to avoid division by zero. (4)
The boundary loss Eq (4) is computed using Sobel edge detection applied to the images y, and the mean absolute difference between the edges of the true and predicted images, scaled by a factor called boundary_weight.
In the results section of our research paper, we are pleased to present exceptionally high-performance metrics achieved after training our model for 30 epochs, firmly establishing the effectiveness of our approach. During this 30-epoch experiment, our model consistently achieved accuracy rates surpassing 99.42%, accompanied by a remarkably low loss value of 0.0165, demonstrating the model’s robust predictive capabilities and its proficiency in error minimization. Furthermore, precision scores consistently exceeded 99.50%, affirming the model’s ability to accurately identify positive cases while maintaining minimal false positives. Sensitivity consistently surpassed 99.26%, signifying the model’s capacity to capture a substantial proportion of true positive cases. Additionally, specificity exceeded 99.83%, showcasing its competence in effectively distinguishing between negative and positive instances. The 18th epoch of training was halted to prevent overfitting and optimize efficiency. The EarlyStopping callback was used to monitor validation loss, and if no improvement was observed for five consecutive epochs, training was terminated. This decision highlighted the importance of closely monitoring and fine-tuning the training process, especially when additional epochs may not significantly enhance performance. The study continued training until the 50th epoch after implementing early stopping at the 18th epoch to explore the model’s performance, ensure stability, experiment with hyperparameters, and meet specific training objectives, justifying the extended training duration for comprehensive assessment and optimization. Values obtained of epoch are given in Table 4.
In the 50-epoch experiment, our model consistently delivered outstanding results, achieving accuracy rates consistently surpassing 99.42%, maintaining low loss values of 0.0165, and demonstrating precision scores of 99.50%, effectively minimizing false positives. Our research showcased a remarkable sensitivity exceeding 99.26%, underscoring the model’s proficiency in identifying true positive cases, while consistently upholding a specificity of 99.83%, demonstrating its excellence in distinguishing negative instances. The Tversky loss was calculated to be 0.0094, and the boundary loss was measured at 0.0246. After 50 epochs, we obtained a dice coefficient of 0.7034 for edema and 0.7091 for enhancing. The outcomes of Accuracy and loss when trained for 18 epoch and Precision, Sensitivity, Specificity for Epoch 18 are presented in Figs 4 and 5. Epoch 50 are presented in Figs 6 and 7 respectively. Notably, these high-performance metrics not only remained stable but also exhibited marginal improvement during the 50-epoch experiment, further highlighting the robustness and reliability of our methodology over extended training periods. This graph shown in Figs 8 and 9 displays ML model’s performance metrics at critical epochs 18 and 50, showing significant enhancement over time, with lower loss values and improved accuracy and classification metrics. The better outcomes are demonstrated by the precision with which tumor classifications and specific core, whole, and augmenting tumors can be distinguished, as seen in columns 4, 5, and 6 of the images. The most recent techniques are used to compare the DSC readings for 2020 and 2021 brats data for Whole Tumor (WT), Tumor Core (TC), and Enhanced Tumor (ET) in Tables 5 and 6. The descriptive results of using the G-Net model with BraTS2020 are shown in the Fig 10. The most significant findings enumerate the results highlighted in Table 7 and Fig 11 shows the qualitative outcomes of the G-Net model applied to BraTS2020.
5 Discussion and future direction
To fully realize the potential of this model in clinical settings, several critical aspects must be considered. First and foremost, regulatory approval is essential. Rigorous testing and validation on diverse datasets are necessary to demonstrate the model’s reliability and safety as a medical device. Healthcare professionals need proper training to interpret and utilize the segmentation results effectively, including understanding the model’s limitations and potential sources of error. Interoperability is crucial—ensuring that the model can seamlessly integrate with various imaging modalities and healthcare IT systems. Continuous improvement mechanisms, such as incorporating feedback from clinical use and updating the model with new data, are vital. Finally, conducting thorough cost-benefit analyses will demonstrate the economic advantages of adopting the model, including time savings, reduced diagnostic errors, and improved patient outcomes. By addressing these aspects, we can successfully integrate the model into clinical practice, enhancing medical diagnostics and patient care. The G-Net Model Integration and Performance Analysis Table 8 highlights the model’s limitations and suggests future approaches, with a special emphasis on how well the model scales to accommodate different tumor categories.
The model’s future directions include enhancing its generalizability by testing it on various tumor types beyond the current dataset, developing lightweight architectures to reduce computational overhead, addressing class imbalances in multi-class segmentation tasks to ensure fair representation of all tumor classes, and optimizing the model for real-time performance in clinical settings for timely decision-making and patient care. These advancements will help improve the model’s applicability and efficiency in practical applications.
6 Conclusion
The research improves the appearance of target images for better patient care by introducing an automated technique for segmenting and measuring brain tumors using the BRATS 2020 MRI dataset. It has been shown that for the segmentation of brain tumors, the proposed G-Net outperformed the U-Net and existing state of art techniques. The quantitative evaluations showed that the proposed technique produced satisfactory outcomes when erroneously segmenting brain tumors. The method also showed how the segmentation process and outcomes might be improved by using the enhancement phase as a prior to treatment stage. This technique helps identify brain tumors early on, which can lead to successful treatment and even save lives. It improves diagnostic precision and lowers death rates by generalizing network networks, predicting tumor aggressiveness, and optimizing computational resources. G-net model for segmenting brain tumors from MRI images is suggested in this research. This technique extracts ROIs (tumor areas in brain MRI) from each unique MRI sequence it gets as training data. Input are normalized and rescaled into single 128 128 images in order to reduce computational expense and boost efficiency. The field of BTS requires enhancement of models like G-Net, domain adaptation, real-time integration with medical imaging systems, making AI-driven segmentation more understandable, personalizing models for individual patient characteristics, dealing with data scarcity through data augmentation, collaboration with medical experts, rigorous clinical validation, and scalability and accessibility through subsampling. The seamless integration of the G-Net model into clinical protocols ensures its practical applicability and effectiveness. It will facilitate real-time tumor segmentation, aiding in diagnosis and treatment decisions. Rigorous performance assessments will occur in clinical environments, with invaluable input from medical experts driving enhancements. G-Net’s scope will extend to encompass various medical imaging tasks, enhancing patient outcomes and diagnostic accuracy. Future research should investigate its integration within clinical workflows, assess its real-world performance, and explore its utility across imaging modalities such as PET, CT, and ultrasonography.
References
- 1.
Tripathi S, Anand R, Fernandez E. A review of brain MR image segmentation techniques. In: Proceedings of International Conference on Recent Innovations in Applied Science, Engineering & Technology; 2018. p. 16–17.
- 2. Ghaffari M, Sowmya A, Oliver R. Automated Brain Tumor Segmentation Using Multimodal Brain Scans: A Survey Based on Models Submitted to the BraTS 2012–2018 Challenges. IEEE Reviews in Biomedical Engineering. 2020;13:156–168. pmid:31613783
- 3. Montaha S, Azam S, Rakibul Haque Rafid A, Hasan MZ, Karim A. Brain Tumor Segmentation from 3D MRI Scans Using U-Net. SN Computer Science. 2023;4(4):386.
- 4. Sangui S, Iqbal T, Chandra PC, Ghosh SK, Ghosh A. 3D MRI Segmentation using U-Net Architecture for the detection of Brain Tumor. Procedia Computer Science. 2023;218:542–553.
- 5. Pedada KR, Rao B, Patro KK, Allam JP, Jamjoom MM, Samee NA. A novel approach for brain tumour detection using deep learning based technique. Biomedical Signal Processing and Control. 2023;82:104549.
- 6.
Abdullah Al Nasim M, Al Munem A, Islam M, Aminul Haque Palash M, Mahim Anjum Haque M, Shah FM. Brain Tumor Segmentation using Enhanced U-Net Model with Empirical Analysis. arXiv e-prints. 2022; p. arXiv–2210.
- 7. Roy S, Saha R, Sarkar S, Mehera R, Pal RK, Bandyopadhyay SK. Brain Tumour Segmentation Using S-Net and SA-Net. IEEE Access. 2023;11:28658–28679.
- 8. Peng Y, Sun J. The multimodal MRI brain tumor segmentation based on AD-Net. Biomedical Signal Processing and Control. 2023;80:104336.
- 9.
Cai Y, Wang Y. Ma-unet: An improved version of unet based on multi-scale and attention mechanism for medical image segmentation. In: Third International Conference on Electronics and Communication; Network and Computer Technology (ECNCT 2021). vol. 12167. SPIE; 2022. p. 205–211.
- 10. Ali TM, Nawaz A, Ur Rehman A, Ahmad RZ, Javed AR, Gadekallu TR, et al. A sequential machine learning-cum-attention mechanism for effective segmentation of brain tumor. Frontiers in Oncology. 2022;12:873268. pmid:35719987
- 11. Bindu NP, Sastry PN. Automated brain tumor detection and segmentation using modified UNet and ResNet model. Soft Computing. 2023;27(13):9179–9189.
- 12. Raza R, Bajwa UI, Mehmood Y, Anwar MW, Jamal MH. dResU-Net: 3D deep residual U-Net based brain tumor segmentation from multimodal MRI. Biomedical Signal Processing and Control. 2023;79:103861.
- 13. Vijay S, Guhan T, Srinivasan K, Vincent P, Chang CY. MRI brain tumor segmentation using residual Spatial Pyramid Pooling-powered 3D U-Net. Frontiers in public health. 2023;11:1091850. pmid:36817919
- 14.
Baid U, Talbar S, Rane S, Gupta S, Thakur MH, Moiyadi A, et al. Deep learning radiomics algorithm for gliomas (drag) model: a novel approach using 3d unet based deep convolutional neural network for predicting survival in gliomas. In: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 4th International Workshop, BrainLes 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Revised Selected Papers, Part II 4. Springer; 2019. p. 369–379.
- 15.
Isensee F, Jäger PF, Full PM, Vollmuth P, Maier-Hein KH. nnU-Net for brain tumor segmentation. In: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 6th International Workshop, BrainLes 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 4, 2020, Revised Selected Papers, Part II 6. Springer; 2021. p. 118–132.
- 16. Biratu ES, Schwenker F, Ayano YM, Debelee TG. A survey of brain tumor segmentation and classification algorithms. Journal of Imaging. 2021;7(9):179. pmid:34564105
- 17. Aboelenein NM, Piao S, Noor A, Ahmed PN. MIRAU-Net: An improved neural network based on U-Net for gliomas segmentation. Signal Processing: Image Communication. 2022;101:116553.
- 18. Yousef R, Khan S, Gupta G, Siddiqui T, Albahlal BM, Alajlan SA, et al. U-Net-Based Models towards Optimal MR Brain Image Segmentation. Diagnostics. 2023;13(9):1624. pmid:37175015
- 19. Rao CS, Karunakara K. A comprehensive review on brain tumor segmentation and classification of MRI images. Multimedia Tools and Applications. 2021;80(12):17611–17643.
- 20.
Yang T, Song J. An automatic brain tumor image segmentation method based on the u-net. In: 2018 IEEE 4th International Conference on Computer and Communications (ICCC). IEEE; 2018. p. 1600–1604.
- 21.
Prasanna G, Ernest JR, Narayanan S, et al. Squeeze Excitation Embedded Attention UNet for Brain Tumor Segmentation. arXiv preprint arXiv:230507850. 2023;.
- 22.
Kasar PE, Jadhav SM, Kansal V. MRI Modality-based Brain Tumor Segmentation Using Deep Neural Networks. 2021;.
- 23.
Hmeed AR, Aliesawi SA, Jasim WM. Enhancement of the U-net architecture for MRI brain tumor segmentation. In: Next Generation of Internet of Things: Proceedings of ICNGIoT 2021. Springer; 2021. p. 353–367.
- 24. Agrawal P, Katal N, Hooda N. Segmentation and classification of brain tumor using 3D-UNet deep neural networks. International Journal of Cognitive Computing in Engineering. 2022;3:199–210.
- 25. Menze BH, Jakab A, Bauer S, Kalpathy-Cramer J, Farahani K, Kirby J, et al. The multimodal brain tumor image segmentation benchmark (BRATS). IEEE transactions on medical imaging. 2014;34(10):1993–2024. pmid:25494501
- 26. Saeed MU, Ali G, Bin W, Almotiri SH, AlGhamdi MA, Nagra AA, et al. RMU-net: a novel residual mobile U-net model for brain tumor segmentation from MR images. Electronics. 2021;10(16):1962.
- 27. Li N, Ren K. Double attention U-Net for brain tumor MR image segmentation. International Journal of Intelligent Computing and Cybernetics. 2021;14(3):467–479.
- 28. Liu D, Sheng N, Han Y, Hou Y, Liu B, Zhang J, et al. SCAU-net: 3D self-calibrated attention U-Net for brain tumor segmentation. Neural Computing and Applications. 2023;35(33):23973–23985.
- 29. Huang H, Yang G, Zhang W, Xu X, Yang W, Jiang W, et al. A deep multi-task learning framework for brain tumor segmentation. Frontiers in Oncology. 2021;11:690244. pmid:34150660
- 30. Fang Y, Huang H, Yang W, Xu X, Jiang W, Lai X. Nonlocal convolutional block attention module VNet for gliomas automatic segmentation. International Journal of Imaging Systems and Technology. 2022;32(2):528–543.
- 31. Guan X, Yang G, Ye J, Yang W, Xu X, Jiang W, et al. 3D AGSE-VNet: an automatic brain tumor MRI data segmentation framework. BMC medical imaging. 2022;22(1):1–18. pmid:34986785
- 32.
Wang W, Chen C, Ding M, Yu H, Zha S, Li J. Transbts: Multimodal brain tumor segmentation using transformer. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24. Springer; 2021. p. 109–119.
- 33.
Wang S, Li L, Zhuang X. AttU-Net: attention U-Net for brain tumor segmentation. In: International MICCAI Brainlesion Workshop. Springer; 2021. p. 302–311.
- 34. Yousef R, Khan S, Gupta G, Albahlal BM, Alajlan SA, Ali A. Bridged-U-Net-ASPP-EVO and deep learning optimization for brain tumor segmentation. Diagnostics. 2023;13(16):2633. pmid:37627893
- 35.
Anwar RW, Abrar M, Ullah F. Transfer Learning in Brain Tumor Classification: Challenges, Opportunities, and Future Prospects. In: 2023 14th International Conference on Information and Communication Technology Convergence (ICTC); 2023. p. 24–29.
- 36. Ullah F, Nadeem M, Abrar M, Amin F, Salam A, Alabrah A, et al. Evolutionary Model for Brain Cancer-Grading and Classification. IEEE Access. 2023;11:126182–126194.
- 37. Ullah F, Nadeem M, Abrar M. Revolutionizing Brain Tumor Segmentation in MRI with Dynamic Fusion of Handcrafted Features and Global Pathway-based Deep Learning. KSII Transactions on Internet & Information Systems. 2024;18(1).
- 38. Ullah F, Nadeem M, Abrar M, Amin F, Salam A, Khan S. Enhancing Brain Tumor Segmentation Accuracy through Scalable Federated Learning with Advanced Data Privacy and Security Measures. Mathematics. 2023;11(19).
- 39. Ullah F, Nadeem M, Abrar M, Al-Razgan M, Alfakih T, Amin F, et al. Brain Tumor Segmentation from MRI Images Using Handcrafted Convolutional Neural Network. Diagnostics. 2023;13(16). pmid:37627909
- 40.
Xiao M, Li Y, Yan X, Gao M, Wang W. Convolutional neural network classification of cancer cytopathology images: taking breast cancer as an example. In: Proceedings of the 2024 7th International Conference on Machine Vision and Applications; 2024. p. 145–149.
- 41.
Wang Z, Voiculescu I. Exigent Examiner and Mean Teacher: A Novel 3D CNN-based Semi-Supervised Learning Framework for Brain Tumor Segmentation. In: International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI). The 2nd Workshop of Medical Image Learning with Limited & Noisy Data (MILLanD); 2023.
- 42. Ruba T, Tamilselvi R, Beham MP. Brain tumor segmentation in multimodal MRI images using novel LSIS operator and deep learning. Journal of Ambient Intelligence and Humanized Computing. 2023;14(10):13163–13177.
- 43. Akter A, Nosheen N, Ahmed S, Hossain M, Yousuf MA, Almoyad MAA, et al. Robust clinical applicable CNN and U-Net based algorithm for MRI classification and segmentation for brain tumor. Expert Systems with Applications. 2024;238:122347.
- 44. Aghalari M, Aghagolzadeh A, Ezoji M. Brain tumor image segmentation via asymmetric/symmetric UNet based on two-pathway-residual blocks. Biomedical signal processing and control. 2021;69:102841.