Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

SVNC-Net: An optimized U-Net variant with 2D convolutions for lightweight 3D spleen segmentation

  • Mehmet Zahid Genc,

    Roles Conceptualization, Formal analysis, Investigation, Software, Writing – original draft

    Affiliation Department of Electrical and Electronics Engineering, Gazi University, Ankara, Turkey

  • Yaser Dalveren ,

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Writing – original draft

    yaser.dalveren@bakircay.edu.tr

    Affiliations Department of Electrical and Electronics Engineering, Izmir Bakircay University, Izmir, Turkey, Department of Cybernetics and Biomedical Engineering, VSB - Technical University of Ostrava, Ostrava, Czechia

  • Ali Kara,

    Roles Supervision, Validation, Writing – review & editing

    Affiliation Department of Electrical and Electronics Engineering, Gazi University, Ankara, Turkey

  • Mohammad Derawi,

    Roles Supervision, Validation, Writing – review & editing

    Affiliations Department of Cybernetics and Biomedical Engineering, VSB - Technical University of Ostrava, Ostrava, Czechia, Department of Electronic Systems, Norwegian University of Science and Technology, Gjovik, Norway

  • Jan Kubicek,

    Roles Supervision, Validation, Writing – review & editing

    Affiliation Department of Cybernetics and Biomedical Engineering, VSB - Technical University of Ostrava, Ostrava, Czechia

  • Marek Penhaker

    Roles Funding acquisition, Project administration, Supervision

    Affiliation Department of Cybernetics and Biomedical Engineering, VSB - Technical University of Ostrava, Ostrava, Czechia

Abstract

Accurate measurement of spleen volume is essential for the diagnosis of splenomegaly. While Computed Tomography (CT) is among the most reliable imaging modalities for this task, manual segmentation of the spleen is labor-intensive and impractical for routine clinical workflows. Automatic segmentation methods provide a more viable alternative for clinical deployment. In recent years, 3D Convolutional Neural Network (CNN) models have been widely used for this purpose due to their high segmentation accuracy. However, their computational and memory demands make them less suitable for real-time applications on edge devices with limited processing capabilities. To address these limitations, we introduce SVNC-Net (Spleen Volume and Neighborhood Convolutional Network) for efficient 3D spleen segmentation from CT scans. Rather than developing an entirely new architecture from scratch, SVNC-Net builds upon the U-Net framework with targeted architectural optimizations for efficiency. In SVNC-Net, each CT slice is processed independently using 2D convolutions. In its architecture, depthwise separable convolution is used to significantly reduce computational complexity and memory usage. To evaluate its performance and efficiency, a comparative analysis was conducted against well-known CNN-based models, including UPerNet, EMANet, CCNet, SegNet, and ShuffleNet. This evaluation was performed on two publicly available datasets used together for the first time in the literature. The promising results achieved from the comparative analysis verified that SVNC-Net is highly suitable for real-time applications and resource-constrained environments. Additionally, we explore post-training compression techniques such as pruning and quantization, which further enhance the model’s compactness and inference speed. These findings contribute to the ongoing efforts to develop efficient 2D deep learning models for 3D organ segmentation, particularly in resource-constrained clinical scenarios.

Introduction

The spleen is one of the important organs that plays a significant role in the immune system response, as it removes waste products by filtering blood and produces white blood cells [1]. The abnormal enlargement of the spleen, known as splenomegaly, is commonly associated with cancers, infections, and other pathological conditions [2,3]. This indicates that spleen volume is the key biomarker to detect splenomegaly. Therefore, accurate measurement of spleen volume is essential for proper diagnosis [4].

In clinical applications, ultrasound (US) imaging is preferred to be used to detect splenomegaly [5]. As is known, using US yields the spleen length only. Thus, spleen volume has still been considered as the gold standard in determining the severity of diseases [6]. One of the imaging modalities that can be used to measure the volume of spleen is Magnetic Resonance Imaging (MRI) [7]. Although it is a valuable diagnostic tool, some challenges, including longer time, higher cost, or limited availability of MRI machines, need to be addressed. An alternative imaging modality is Computed Tomography (CT), regarded as the most reliable way that provides accurate measurements of spleen volume. Traditionally, the changes in the spleen volume from CT scans can be observed by an expert using manual segmentation [8]. However, the manual segmentation of the spleen is time-consuming. Instead, automatic segmentation methods could be more practical from a clinical implementation perspective. Yet, the accuracy of the first generation of the automatic segmentation methods, such as multiatlas [9] or active contours [10], is limited because of the variability in the shape of the spleen [11].

Recently, Deep Learning (DL) approaches have been used for biomedical image segmentation tasks [12]. Particularly, Convolutional Neural Network (CNN)-based models have become very popular for automatic spleen segmentation from CT scans [1325]. It is important to note that these models are based on 3D architectures since a CT scan is inherently 3D. Even though these architectures provide high segmentation performances, their computational costs and memory needs could still be significant concerns for their usage in clinical conditions due to the limited computational capabilities of the edge devices. To address this issue, some works utilizing 2D convolutional architectures have been proposed [2628]. Nevertheless, these works have proposed specific approaches or pipelines that can be incorporated into routine clinical workflows instead of providing a simple and lightweight 2D model for 3D spleen segmentation. This gap highlights the need for a task-specific, lightweight 2D convolutional architecture that is both computationally efficient and accurate enough for 3D spleen segmentation under standard clinical conditions. Moreover, post-training model compression techniques such as pruning and quantization have emerged as practical tools to further reduce model complexity and adapt existing architectures to embedded or real-time clinical settings.

In this study, we aim to develop a lightweight yet efficient approach for 3D spleen segmentation from CT scans. To this end, the Spleen Volume and Neighborhood Convolutional Network (SVNC-Net), an optimized and task-specific 2D variant of the traditional U-Net architecture is proposed. The main idea behind the proposed model is to process each slice of 3D CT scans independently using 2D convolutions. A key advantage of SVNC-Net lies in its use of depthwise separable convolutions, which enhance computational efficiency by requiring fewer parameters than traditional convolutions and by reducing dependency on the number of channels. To comprehensively evaluate the segmentation accuracy and computational performance of SVNC-Net, comparative experiments are conducted against well-known CNN-based segmentation models, such as Unified Perceptual Parsing for Scene Understanding (UPerNet) [29], Expectation-Maximizing Attention Network (EMANet) [30], Criss-Cross Network (CCNet) [31], Semantic Segmentation Network (SegNet) [32], and ShuffleNet [33], on the publicly available datasets, including the Medical Segmentation Decathlon (MSD) [34] and Duke Spleen Data Set (DSDS) [35]. The results indicate that SVNC-Net achieves higher segmentation accuracy while significantly reducing training time and computational overhead. Post training compression techniques, including model aware pruning and 8 bit quantization, are applied to SVNC Net and the baseline models in addition to architectural design, and their effects on performance, memory usage, and inference time are reported. These compression techniques are applied only to the models trained on the MSD dataset, in order to analyze their computational impact without introducing dataset-specific confounders. Thus, the results achieved from this study highlight the potential of lightweight models for real-time medical image analysis and mobile diagnostic applications. The key contributions of this study can be summarized as follows:

  • From a standard clinical workflow perspective, a lightweight 2D model (SVNC-Net) is proposed for 3D spleen segmentation from CT scans.
  • The performances of well-known CNN-based models are comparatively assessed on the publicly available datasets for spleen segmentation.
  • Useful insights that may enable the research community to develop future 2D DL models for 3D organ segmentation are provided.
  • Post-training model-aware pruning and quantization techniques are applied to demonstrate further compression and speed-up without significant performance loss.

The rest of the article is structured as follows. In the following section, the relevant works proposed in the literature are discussed. The architecture of SVNC-Net section presents the network design in detail, while the Experimental details section outlines the implementation and evaluation procedures. The results achieved from the experiments are described in Results section. Next, Discussion section summarizes the key findings. Finally, the article concludes with Conclusion section.

Related work

The implementation of 2D CNN architectures for spleen segmentation from 3D CT scans has a significant potential to enhance decision-making processes within clinical practice. By utilizing 2D CNNs, segmentation tasks can benefit from reduced computational complexity and increased efficiency compared to fully 3D approaches. However, research in this domain remains in its early stages, with only a limited number of studies addressing the development of robust and generalizable segmentation models.

An automated pipeline was proposed by Moon et al. for the abdominal spleen segmentation [26]. This pipeline offers an end-to-end synthesized process that eliminates the need for package installation while enabling local management of intermediate results through three major stages: pre-processing of input data, spleen segmentation based on SSNet using ResNet network and GAN, and 3D reconstruction by aligning the segmentation results with the original image dimensions for future use, display, or demonstration. The experimental results showed that the pipeline could provide fast and accurate segmentation of the spleen compared to traditional measurement methods.

Zettler et al. introduced a two-step approach for 3D segmentation on abdominal organs inside volumetric CT images, including liver, kidney, spleen, and pancreas [27]. In the approach, initially, a bounding box was generated to extract the volume of interest of each organ. This was then used as input for the second step involving segmentation using U-Nets with varying architectural dimensions. It was aimed at comparing the performance of 2D U-Nets against their 3D counterparts. According to the results, with a mean dice score of 0.93, 2D U-Net architecture showed promising results for 3D CT data.

Yuan et al. presented a framework based on a Variational Auto Encoder (VAE) to measure spleen volume from 2D spleen segmentations [28] was. Within this framework, three methods were proposed, and their performances for 3D CT data (both single- and dual-view) were evaluated. It was shown that the regression VAE connected layers with activation functions to model the potentially non-linear relationship between the latent embedding and volume (RVAE-FCNR) method achieved a mean relative volume agreement of 86.62% when using single-view data and 92.58% when using dual-view data.

The studies presented in the literature regarding the implementation of 2D CNN-based models for spleen segmentation from 3D CT scans are summarized in Table 1. As can be deduced from Table 1, while previous studies have proposed end-to-end pipelines and advanced frameworks for spleen segmentation, they often lack simplicity and generalizability required for deployment in clinical workflows.

thumbnail
Table 1. Studies on the use of 2D CNN-based models for spleen segmentation from 3D CT images.

https://doi.org/10.1371/journal.pone.0332482.t001

On the other hand, while 2D CNN architectures offer significant advantages in terms of computational efficiency, their inherent limitation is the inability to fully capture inter-slice contextual information present in 3D medical images. This can lead to reduced segmentation accuracy, especially in anatomically complex regions or in cases where organ boundaries are ambiguous. To mitigate this limitation, hybrid strategies have been proposed, such as using adjacent slices as input channels or integrating context-aware attention modules [3638]. However, these methods often introduce additional complexity that may conflict with the goal of achieving a lightweight and easily deployable solution [39].

Recent research has increasingly focused on model compression strategies to enable real-time segmentation on resource-constrained devices. Post-training quantization (PTQ) and pruning techniques have been shown to significantly reduce model size and memory footprint while maintaining accuracy within acceptable margins. For instance, Han et al. demonstrated that the combination of pruning and quantization can yield a compression rate up to 35× without accuracy degradation [40]. Similarly, EfficientQ proposes a fast PTQ method with a layer-wise ADMM optimization and self-adaptive attention for segmentation tasks, achieving significant memory reduction while preserving accuracy on BraTS and LiTS datasets [41]. Xu et al. applied both quantization-aware training and post-training quantization to fully convolutional networks (FCNs) for gland segmentation, reporting up to 6.4 × memory reduction with even slight gains in segmentation accuracy [42]. In another study, evolutionary algorithms such as Differential Evolution (DE), Genetic Algorithms (GA), and Particle Swarm Optimization (PSO) were utilized to guide pruning for sparsity control in lung CT segmentation tasks, balancing model compactness and accuracy [43]. These approaches highlight the practical viability of compressed models in clinical workflows, especially when integrated into lightweight CNN backbones. Nevertheless, although several studies have applied pruning and quantization techniques for model compression in medical image analysis, very few have addressed their integration into task-specific segmentation models, particularly for spleen CT scans.

As a summary, to the best of our knowledge, no existing study has introduced a lightweight, compression-aware 2D segmentation model specifically tailored for spleen segmentation from 3D CT scans. Therefore, this study proposes SVNC-Net, which combines architectural efficiency with post-training compression techniques to ensure optimal performance and deployability under real-world clinical conditions.

The architecture of the SVNC-Net

The SVNC-Net is designed as an optimized and task-specific 2D variant of the traditional U-Net architecture, which is a widely used CNN for medical image segmentation [44]. The architecture of the model is shown in Fig 1. The model follows a symmetric encoder-decoder structure that enables efficient feature extraction while ensuring precise spatial reconstruction of segmented regions. It is designed to process grayscale or multi-channel medical images, such as CT scans, and generate segmentation masks that accurately define anatomical structures. The model accepts an input image of size 512 × 512 and outputs a segmentation map of the same spatial dimensions. In total, the architecture consists of 52 layers, incorporating convolutional layers, batch normalization, ReLU activation functions, pooling operations, and transposed convolutions for upsampling.

The encoder path comprises five downsampling stages, each consisting of convolutional operations followed by max-pooling layers that progressively reduce the spatial resolution while increasing the feature map depth. Initially, the input image of 512 × 512 pixels is processed by two standard convolutional layers with a kernel size of 3 × 3, producing 32 feature maps of the same resolution. These convolutions are followed by batch normalization and ReLU activation functions, which improve training stability and convergence. A 2 × 2 max-pooling layer subsequently reduces the spatial dimensions to 256 × 256, doubling the number of feature channels. The second stage introduces depthwise separable convolutions, which process 64 feature maps at a resolution of 256 × 256 before another max-pooling operation reduces the size to 128 × 128. The third stage further deepens the model by increasing the feature maps to 128, followed by max-pooling to 64 × 64. The fourth stage expands the feature map depth to 256, and downsampling reduces the resolution to 30 × 30. At the lowest resolution stage, known as the bottleneck, the network processes 512 feature maps at 30 × 30 pixels, capturing the highest level of abstraction and ensuring that the most representative features are learned.

At the bottleneck layer, the model applies additional depthwise separable convolutions and standard convolutions to refine high-dimensional feature representations. Here, as discussed by Howard et al., depthwise separable convolution is particularly efficient relative to standard convolution [45]. In Fig 2, the process of standard convolution and depthwise separable convolution is illustrated to demonstrate their structural differences. Specifically, the computational cost of standard convolutions is:

thumbnail
Fig 2. The process of standard convolution and depthwise separable convolution.

https://doi.org/10.1371/journal.pone.0332482.g002

(1)

where it may depend multiplicatively on the number of input channels , the number of output channels , the feature map size × , and the kernel size × . Although the depthwise convolution has limited effectiveness with single-channel data, the 1 × 1 pointwise convolution efficiently adds new feature maps, ensuring an overall gain in performance. In this case, the computational cost of depthwise separable convolutions can be calculated by

(2)

Then, decomposing the convolution operation into two distinct steps, namely filtering and combination, yields a reduction in computational complexity:

(3)

This stage serves as a critical point in the network, ensuring that abstract spatial features are effectively learned before reconstruction begins. Once the feature extraction is complete, the decoder path restores the segmentation map by progressively upsampling the feature maps to their original resolution. The decoder consists of transposed convolutional layers that restore spatial dimensions while reducing feature map depth. Additionally, skip connections between corresponding encoder and decoder layers enable the network to retain fine-grained spatial details lost during downsampling. These connections help preserve anatomical boundaries and improve segmentation accuracy.

The upsampling process occurs in four stages. First, a transposed convolution increases the spatial resolution to 64 × 64, producing 256 feature maps. Skip connections are integrated from the corresponding encoder stage to fuse both high- and low-level spatial information. In the second upsampling stage, the resolution is increased to 128 × 128, with 128 feature maps being restored. The third upsampling stage further increases the resolution to 256 × 256, refining the segmentation details. Finally, in the fourth stage, the feature maps are upsampled to 512 × 512, with the number of feature channels reduced to 32. This ensures that the final segmentation mask maintains high spatial accuracy and is well-aligned with the input image.

The final prediction layer consists of a 1 × 1 convolution that outputs a segmentation mask of 512 × 512 pixels, ensuring that each pixel is classified into the appropriate category. The number of output channels in this layer depends on the segmentation task, with one output channel for binary segmentation and multiple channels for multi-class segmentation. The overall model is designed to optimize computational efficiency while maintaining high segmentation accuracy. Depthwise separable convolutions play a key role in reducing the number of trainable parameters while preserving feature extraction capacity [29]. Additionally, the inclusion of batch normalization and ReLU activation functions throughout the architecture ensures stable training dynamics and accelerates convergence.

The total number of layers in the model is 52, and its design is balanced between computational efficiency and segmentation performance. The use of skip connections helps maintain spatial integrity, while transposed convolutions ensure effective upsampling. The combination of standard and depthwise separable convolutions further enhances efficiency, making the model well-suited for large-scale medical imaging applications, such as spleen segmentation from CT scans. Given its structural design, the SVNC-Net is expected to perform effectively in clinical segmentation tasks, where accurate and computationally efficient solutions are required.

To provide a summary of the structural differences between the traditional U-Net and the SVNC-Net, the key architectural modifications and hyperparameter adjustments are presented in Table 2. The table outlines specific changes in convolutional operations, channel configurations, normalization strategies, and other core design elements, along with the motivation behind each modification.

thumbnail
Table 2. Summary of key architectural modifications and hyperparameter adjustments from the U-Net to SVNC-Net.

https://doi.org/10.1371/journal.pone.0332482.t002

Experimental details

Experiments were conducted to assess the effectiveness of the SVNC-Net model and the well-known CNN-based models considered as baseline models using publicly available datasets. In the following sections, the baseline models are introduced, and then a brief overview of the publicly available datasets utilized for testing the models is provided. Additionally, the pre-processing steps and implementation details are described.

Datasets

In the literature, only two single-organ segmentation datasets with CT scans for spleen are publicly available: the MSD dataset [34] and the DSDS [35]. In general, the MSD dataset is widely utilized in the field of medical image segmentation as it provides a large volume and multi-layered imaging data for various segmentation tasks. It comprises 61 3D portal venous phase CT scans, among which 41 cases can be used for training purposes. A sample of MSD dataset is shown in Fig 3.

thumbnail
Fig 3. A sample of MSD spleen dataset: (a) Raw image; (b) Ground truth (spleen region is indicated as green color); (c) 3D display.

https://doi.org/10.1371/journal.pone.0332482.g003

As an alternative to the MSD dataset, the DSDS has been provided to facilitate the development of spleen segmentation models. The dataset includes 109 anonymized CT and MRI volumes, comprising a total of 6322 images. Specifically, it is divided into 29 axial CT post-contrast series, 40 coronal SSFSE MRI series, and 40 axial opposed phase MRI series. Fig 4 shows a sample of the DSDS dataset.

thumbnail
Fig 4. A sample of DSDS spleen dataset: (a) Raw image; (b) Ground truth (spleen region is indicated as green color); (c) 3D display.

https://doi.org/10.1371/journal.pone.0332482.g004

Before performing the experiments, 3D CT images from the datasets were transformed into 2D slices that can be used in training phase. When the 41 CT scans in the MSD dataset were divided into slices, a total of 45634 images and 45634 corresponding masks were obtained. However, since a significant number of mask images contained no labels (images without a spleen), the entire dataset could not be used. Therefore, such unlabeled mask images and their corresponding scans were excluded from the dataset. Following this filtering process, 10750 images along with 10750 masks were retained. Furthermore, only slices along the z-axis were intended to be utilized. In fact, these slices represent cross-sectional layers that encode the depth information of the 3D volume, allowing for the reconstruction of the original 3D structure when combined. This approach preserves the spatial depth characteristics of the data, which is expected to enhance the effectiveness of the 2D segmentation model. Thus, with the use of slices along the z-axis only, a total of 1050 images and 1050 mask data were obtained. On the other hand, 29 CT (z-axis) scans in the DSDS could be used within the context of our study. After dividing them into slices, a total of 1000 images and 1000 corresponding masks were obtained. Before feeding the data into the models, the input dataset was resized to 512 × 512.

To ensure a reliable assessment of the performance of the SVNC-Net, both datasets were divided into 70% for training, 20% for validation, and 10% for testing. This distribution was chosen to effectively balance the evaluation of the generalization ability across different data subsets.

Overview of the baseline models

In the experiments, the well-known and efficient CNN-based models in medical image segmentation, such as the UPerNet, EMANet, CCNet, SegNet, and ShuffleNet, were used as baseline models to verify the effectiveness of the SVNC-Net. Although these models have been implemented in various medical image segmentation tasks, their performances on spleen segmentation from CT images have not been reported yet. Therefore, we also aimed to assess their performances for the first time in the literature. In the following, the baseline models are briefly described.

UPerNet.

UPerNet is a high-performance semantic segmentation model that integrates multi-scale feature representations through a Feature Pyramid Network (FPN) and a Pyramid Pooling Module (PPM) [29]. The architecture employs a deep convolutional backbone, such as ResNet-50, ResNet-101, or a transformer-based model like Swin Transformer, to extract multi-resolution feature maps. The FPN enhances feature propagation across scales, while the PPM captures global contextual dependencies by applying spatial pooling at multiple levels (e.g., 1 × 1, 2 × 2, 3 × 3, and 6 × 6). The decoder refines these representations through upsampling and convolutional operations to generate precise segmentation maps.

UPerNet-based models are effectively used in medical imaging for multi-organ segmentation, particularly in abdominal and thoracic imaging, liver segmentation, and brain tumor segmentation, using multi-scale feature aggregation [4648].

CCNet.

CCNet addresses the computational inefficiency of traditional self-attention mechanisms by introducing a Criss-Cross Attention (CCA) module, which captures contextual dependencies through a series of recurrent attention operations along horizontal and vertical directions [31]. The backbone, typically ResNet-50 or ResNet-101, extracts hierarchical feature representations, which are then processed by the CCA module. By iteratively aggregating context from criss-cross paths, the model effectively captures long-range dependencies without the computational overhead of full self-attention.

CCA-based models are utilized in medical image segmentation for lung nodule segmentation, cardiac MRI segmentation, and real-time histopathology image segmentation due to its ability to model contextual dependencies and enhance boundary delineation [49,50].

EMANet.

EMANet introduces an Expectation-Maximization (EM) attention mechanism to efficiently capture long-range dependencies in medical image segmentation [30]. The network utilizes a deep CNN backbone such as ResNet-50 or ResNet-101 to extract initial feature maps, which are then processed by the EM Attention Module. This module employs an iterative EM algorithm to learn a compact set of attention bases that summarize global contextual information. These bases are iteratively refined and used to generate attention coefficients that enhance feature representations while maintaining computational efficiency.

EM-based models are effective in medical imaging for segmenting heterogeneous structures like brain lesions and ischemic strokes, improving segmentation accuracy in complex organs like pancreatic tumors, and distinguishing between fine vessel structures and background noise [5153].

SegNet.

SegNet is an encoder-decoder-based segmentation model designed for efficient memory utilization and real-time inference [32]. The encoder follows the architecture of VGG-16, consisting of 13 convolutional layers with 3 × 3 kernels and ReLU activations. Spatial resolution is progressively reduced through five max-pooling layers, and the indices of these pooling operations are stored for later use in the decoder. Unlike conventional architectures, SegNet performs unpooling using these stored indices, allowing it to restore spatial resolution efficiently without the need for learned upsampling. The decoder consists of five unpooling layers followed by convolutional layers that refine the feature representations before classification.

The models based on SegNet are widely used in biomedical image segmentation tasks, particularly in lesion segmentation and histopathology image segmentation due to its ability to retain spatial features and process high-resolution microscopy data [5456].

ShuffleNet.

ShuffleNet is a computationally efficient deep learning model designed for high-speed inference on resource-constrained hardware, such as mobile and embedded systems [33]. The architecture incorporates pointwise group convolutions to reduce computational complexity, depthwise separable convolutions to enhance spatial feature extraction, and channel shuffle operations to improve information exchange between grouped features. The network consists of multiple ShuffleNet units, where each unit performs a series of grouped and depthwise convolutions, followed by channel shuffling to ensure effective feature propagation. The final stage of the network includes global average pooling and a fully connected layer for classification.

ShuffleNet-based models are used in medical image segmentation for real-time applications like lung ultrasound segmentation in COVID-19 diagnosis, and retinal fundus image segmentation [57,58].

Training and implementation

In the training phase, the Adam optimizer was selected due to its robustness in handling the gradient instability commonly encountered in medical image segmentation tasks. In such problems, especially with organ segmentation from CT scans, some regions of the image contain rich anatomical information, while large areas may correspond to background with minimal signal. This imbalance often results in uneven gradient magnitudes across the network, with certain layers receiving weak gradients and others experiencing sharp updates. Adam optimizer offers a practical solution by incorporating estimates of both the first and second moments of the gradients to suppress noisy updates and sustain learning even in regions with low signal. Additionally, its adaptive learning rate mechanism allows each parameter to be updated at an appropriate scale, which is particularly beneficial in encoder-decoder architectures where the information density may vary significantly between layers. As a result, Adam contributes to faster convergence and improved training stability in such heterogeneous data settings.

However, to achieve optimal performance, the learning rate was not fixed to a specific rate. Instead, a cosine annealing with warm restarts was employed to accelerate convergence [59]. In this approach, the learning rate () is dynamically adjusted at each iteration based on a cosine function:

(4)

where is the index of the run, is the number of epochs between two warm restarts that defines the duration of each cosine annealing cycle, is the number of epochs since the last restart, is the minimum learning rate, and is set to the initial learning rate. In the training phase, then, was set to 1 × 10−3, and was set to 1 × 10−5. The values of were set to 10, 20, and 20 epochs for the first, second, and third cycles, respectively. Since the training of the models was completed in 50 epochs, the for the third cycle was set to 20.

To improve segmentation accuracy, the loss value was optimized using a combination of Dice Loss and Binary Cross-Entropy Loss. The combined loss () can be defined as [60]:

(5)

where is the Dice Loss, is the Binary Cross-Entropy Loss, and is the weighting factor that balances the contribution of the two loss components. To ensure that Dice Loss and Binary Cross-Entropy Loss contribute equally during training, the value of was set to 0.5.

Moreover, a stratified partitioning strategy was adopted to ensure a reliable evaluation of the model’s generalization capability while maintaining computational efficiency. Initially, the dataset was divided into 10 equal-sized, stratified folds (). Then, three independent evaluation rounds () were designed, each following the standard 70% training, 20% validation, and 10% testing split. In each round, 7 folds were used for training, 2 for validation, and 1 for testing, as shown in Table 3. The main purpose of using this approach was to balance computational efficiency and statistical reliability. This approach is consistent with similar practices in the literature, where three to five evaluation rounds are often used [61,62]. While this approach does not correspond to full 10-fold cross-validation, which would require 10 independent training runs to preserve the same data proportions, it still allows for independent evaluation of the model across multiple test subsets. Importantly, not all folds were used as test sets. Instead, three distinct folds were selected for testing across the evaluation rounds, and the remaining folds were reused in different roles. It should also be noted that the validation sets were treated as internal evaluations used during training, while the test sets represent external, held-out evaluations used solely for final performance estimation. Although validation data were used during model development, no validation performance metrics were reported.

The experiments were conducted in a GPU environment provided by Google Colab that is optimized for processes requiring high computational power. More specifically, a single NVIDIA Tesla T4 GPU was used for the experiments. Moreover, the implementation was performed using the Python programming language with the PyTorch library.

As a summary, the training hyperparameters and implementation details discussed in this section is provided in Table 4.

Post-Training compression

In addition to standard training and evaluation, this study investigates the impact of post-training model compression techniques, specifically pruning and quantization, on the SVNC-Net and baseline models. These techniques were applied only on models trained with the MSD dataset, to isolate computational effects from dataset-specific variances and avoid unnecessary complexity.

Pruning strategy.

A model-aware pruning strategy was implemented to enable simplification while preserving the structural integrity of each network. Two pruning techniques were employed. The first one was weight pruning (), which removes low-magnitude weights throughout the network to introduce sparsity with minimal disruption [63]. Another is neuron/filter pruning (), which eliminates entire filters or neurons, resulting in a more substantial reduction in model complexity [64].

The pruning approach was tailored according to the architectural scale of each model. For more complex networks, including UPerNet, EMANet, CCNet, and SegNet, a combined strategy was applied, consisting of 15% weight pruning and 20% neuron/filter pruning (). In contrast, for lightweight architectures such as SVNC-Net and ShuffleNet, only 15% weight pruning was employed to preserve the performance characteristics of these compact models. Importantly, no fine-tuning was conducted following pruning, allowing for a direct evaluation of the unrefined impact of sparsity on model performance.

Quantization strategy.

PTQ was applied using 8-bit integer precision, converting model weights and activations from 32-bit floating point (float32) to 8-bit integer (int8) representations without additional retraining. This approach was adopted to emulate deployment scenarios on resource-constrained or embedded systems [65].

To assess the impact of both pruning and quantization, several evaluation metrics were reported, including post-training Intersection over Union (IoU), memory usage after compression, and inference time. These metrics served to quantify the trade-offs between segmentation accuracy and computational efficiency across the different model architectures.

Results

This section is devoted to discussing the results obtained from the experiments. To this end, first, the metrics that were used for quantifying the performance of the models are outlined. Then, the effectiveness of the SVCN-Net and baseline models on the MSD dataset and DSDS is comparatively assessed.

Performance metrics

To evaluate the efficiency of SVNC-Net and baseline models, it is essential to employ specific performance metrics that comprehensively assess their segmentation accuracy and computational demands. In the context of segmentation accuracy, key evaluation metrics, including IoU, mean IoU (mIoU), and the Dice Similarity Coefficient (DSC), were utilized. Additionally, computational efficiency was assessed using metrics, such as inference time (), total memory usage (), training time (), and the number of model parameters (). A brief explanation of each metric is presented in the following sections.

Intersection over union ()

The metric is a generalization of the Jaccard index which is widely used for evaluating the accuracy of the segmentation algorithms [66]. It measures the overlap between the predicted segmentation region and the ground truth segmentation region in medical images, such as organs, tumors, or lesions. It can be calculated as the predicted segmentation () and the ground truth segmentation ():

(6)

where denotes the number of pixels shared between the predicted and ground truth masks, and denotes the total number of pixels in either the predicted or ground truth masks. The ranges from 0 to 1, inclusive. Higher values indicate higher segmentation performance.

Mean intersection over union ()

The is the average for all classes in a given dataset. It can be expressed by averaging the values for all segmentation classes:

(7)

where is the total number of classes, and is the for the th class that can be defined as

(8)

where is the predicted segmentation for the th class, and is the ground truth segmentation for the th class.

The metric is particularly important in multi-class segmentation task. Nevertheless, in this study, was computed by averaging the values of the spleen (foreground) and background classes to better reflect overall segmentation performance. In fact, the metric was adopted by calculating the separately for both classes and then averaging the results. This approach aligns with the pixel-wise classification nature of segmentation tasks, where both foreground and background predictions are important. Including background performance is particularly useful for evaluating how well the model balances class predictions, especially in cases with significant class imbalance [67].

Dice similarity coefficient ()

The is a spatial overlap index proposed by Dice [68] used to measure measures the overlap between the predicted and ground truth segmentations. It can be defined as follows

(9)

As it is often used alongside , it can also be defined as follows

(10)

The DSC ranges from 0 to 1, where the value of one indicates the highest segmentation accuracy, and lower values indicate significant deviations between the predictions and the ground truth mask.

Computational efficiency metrics

In clinical applications, particularly medical image segmentation, the selection of optimal DL models requires a comprehensive evaluation of computational cost and scalability. Key indicators such as inference time (), total memory usage (), training time (), and the number of model parameters () may directly affect the feasibility and efficiency of deployment. Therefore, in this study, these metrics were utilized to provide alternative assessments of model efficiency beyond segmentation accuracy.

Rapid inference is important for real-time diagnostics and efficient clinical workflows, while minimized memory usage facilitates deployment on resource-constrained medical devices and reduces operational costs. Furthermore, expedited training times enable rapid model development, adaptation to evolving clinical data, and accelerated research. An acceptable balance in model parameterization ensures robust generalization, mitigates overfitting, and optimizes computational resource utilization, thereby enhancing the reliability and applicability of AI-driven medical image analysis.

The significance of these metrics becomes more prevalent in medical image segmentation due to the complex nature of medical imagery and the critical need for accurate and timely analyses. In clinical practice, where decisions are often time-sensitive and resource limitations are tight, models should demonstrate both high accuracy and computational efficiency. Thus, the selection process requires a comprehensive assessment of these factors to ensure that the chosen model not only meets the diagnostic requirements but also aligns with the practical constraints of the clinical environment.

Segmentation accuracy

The segmentation accuracy of SVNC-Net and baseline models on the MSD and DSDS datasets, obtained from the 10% testing subset, is presented in Table 5. As can be seen from the table, SVNC-Net achieved an IoU of 0.89, an mIoU of 0.94, and a DSC of 0.94 on the MSD dataset. This indicates that SVNC-Net performed competitively, matching the performance of significantly larger models such as UPerNet, which achieves an IoU of 0.87, an mIoU of 0.93, and a DSC of 0.93, EMANet achieved an IoU of 0.89, an mIoU of 0.95, and a DSC of 0.94, and CCNet had an IoU of 0.91, an mIoU of 0.95, and a DSC of 0.95. Therefore, it can be deduced that SVNC-Net is able to achieve comparable segmentation accuracy on the MSD dataset.

On the DSDS dataset, with an IoU of 0.84, an mIoU of 0.92, and a DSC of 0.92, SVNC-Net achieved the highest performance when compared to baseline models. Although these results are comparable to those of ShuffleNet and UPerNet, both of which achieved an IoU of 0.83, an mIoU of 0.91, and a DSC of 0.91, SVNC-Net still maintained a high accuracy. The ability of SVNC-Net to generalize across multiple datasets further reinforces its applicability in real-world scenarios.

On the other hand, the performance analysis of the models across the MSD dataset and DSDS reveals notable variations. EMANet and CCNet demonstrated better performance on the MSD dataset. However, their performance significantly reduced on the DSDS dataset. This may indicate a lack of generalizability across different datasets. Moreover, SegNet demonstrated poor performance on both datasets, which may raise concerns about its limited effectiveness in segmentation tasks. UPerNet, while performing slightly worse on DSDS in comparison to MSD, maintained a relatively stable and acceptable performance compared to other baseline models. Furthermore, the performance of ShuffleNet was robust, which achieved nearly identical performance across both datasets. Nevertheless, when considering overall results, SVNC-Net outperformed ShuffleNet, particularly when its lightweight architecture is taken into account.

Computational performance

Computational efficiency of the models are provided in Table 6. It is important to note that this table presents the results obtained from the MSD dataset only, as similar results were observed on the DSDS dataset. From the table, the lightweight nature of SVNC-Net can be clearly observed. As listed in the table, it contains only 1 million (M) parameters (), which is substantially smaller than other efficient models, such as UPerNet (60.1M) and ShuffleNet (5.5M). Obviously, SVNC-Net is the most compact model that highlights its design efficiency.

As discussed earlier, inference time () is another important metric to select a proper model for clinical applications. According to the results, SVNC-Net demonstrated the lowest inference time (34.4 ms). It is significantly faster than other models, even than relatively lightweight ShuffleNet (40.3 ms). This is particularly beneficial for real-time applications, where timely segmentation results are critical.

Total memory usage () is another defining characteristic of SVNC-Net. Requiring only 4MB of memory, it is more efficient than other efficient models such as UPerNet (253MB) and ShuffleNet (21MB). This makes SVNC-Net highly suitable for deployment in resource-constrained environments, such as embedded systems or clinical workstations with limited computational capacity.

Training efficiency is another advantage of SVNC-Net. With a training time () of only 29 min, it significantly outperformed larger models, such as UPerNet (179 min) and CCNet (162 min), which require substantially more computational resources. Particularly, it had lower training time in comparison to its lightweight rival ShuffleNet (41 min). In clinical environments, the reduced training time not only enhances practicality but also facilitates rapid model updates.

Compression evaluation

The impact of compression techniques on segmentation models trained on the MSD dataset is summarized in Table 7. In general, pruning (both weighted and filter-based) yielded consistent reductions in parameter count and inference latency, while quantization substantially reduced memory consumption. However, the magnitude of performance degradation varied depending on model architecture and initial complexity. For heavyweight models such as UPerNet and CCNet, pruning led to significant reductions in parameters and faster inference, while only slightly decreasing IoU, from 0.87 to 0.86 for UPerNet and from 0.91 to 0.88 for CCNet. These models, which involve complex multi-branch structures or attention modules, benefit from structured pruning, though quantization appears more challenging, especially for UPerNet, which includes LayerNorm and deeply nested operations known to be sensitive to low-precision computations. While UPerNet’s quantized variant experienced a more pronounced drop in IoU (to 0.83), CCNet exhibited strong robustness, maintaining an IoU of 0.89 after quantization while achieving a 34.7% reduction in memory usage. This may be due to its well-isolated Criss-Cross attention units, which are less susceptible to quantization error propagation. Similarly, EMANet demonstrated excellent compression tolerance, with both pruning and quantization preserving an IoU of 0.88 while reducing memory requirements. This indicates the relative resilience of EM-based attention structures to weight perturbations caused by quantization. In contrast, SegNet displayed heightened sensitivity to pruning, with a significant drop in IoU from 0.82 to 0.77, which can be attributed to its shallow encoder-decoder design lacking residual or normalization pathways to compensate for parameter loss. Interestingly, the quantized variant partially recovered accuracy, reaching an IoU of 0.81, which suggests that quantization, especially when applied to simpler architectures, may be a less destructive compression strategy than aggressive filter pruning.

thumbnail
Table 7. Impacts of pruning and quantization on the model performances.

https://doi.org/10.1371/journal.pone.0332482.t007

In the lightweight category, both ShuffleNet and the proposed SVNC-Net exhibited strong compression resilience and real-time feasibility. ShuffleNet maintained competitive IoU values (0.825 after pruning, 0.81 after quantization), with memory usage reduced to just 17 MB. However, this slight drop in accuracy can be attributed to its reliance on grouped and channel-shuffled convolutions, which are structurally more vulnerable to 8-bit quantization errors due to reduced redundancy. Notably, SVNC-Net achieved the best balance between accuracy and efficiency, starting with an IoU of 0.89 and maintaining 0.882 and 0.86 after pruning and quantization, respectively. Despite incorporating depthwise separable convolutions, which are known to be sensitive to low-precision arithmetic, SVNC-Net preserved stable performance, suggesting that its layerwise placement of depthwise separable convolutions effectively mitigates quantization artifacts. With only 1M parameters and a final memory footprint as low as 3 MB, the quantized SVNC-Net proved to be the most lightweight model tested. These findings strongly support the suitability of SVNC-Net for deployment in real-time, edge-based medical image segmentation systems, where both accuracy and efficiency are critical.

Qualitative evaluation

To visually assess the segmentation performance of the proposed SVNC-Net and other baseline models, qualitative comparisons were conducted on MSD and DSDS. Ten representative CT slices, comprising five from each dataset, were selected to illustrate performance differences, particularly in terms of boundary delineation and regional consistency. The results are shown in Figs 5 and 6. In all visualization samples, the ground truth is overlaid as a red boundary line, while the predicted segmentation masks are shown as filled regions for each model. This enables clear visual comparison of boundary alignment and segmentation consistency across different methods. All segmentations were rendered using the same color scheme to maintain visual uniformity. As can be seen from the figures, across both datasets, SVNC-Net displays more precise boundary localization and better region consistency, particularly when compared to ShuffleNet and SegNet. Although UPerNet and CCNet also perform well, their computational overhead is significantly higher. SVNC-Net achieves a competitive visual quality while maintaining lightweight and real-time feasibility, confirming its suitability for edge-based clinical applications.

thumbnail
Fig 5. Qualitative comparison of spleen segmentation results on the MSD dataset: (a) Input CT slice, (b) Ground Truth (red boundary), (c) UPerNet prediction, (d) EMANet prediction, (e) CCNet prediction, (f) SegNet prediction, (g) ShuffleNet prediction, (h) SVNC-Net prediction.

The red contours represent the expert-annotated ground truth masks.

https://doi.org/10.1371/journal.pone.0332482.g005

thumbnail
Fig 6. Qualitative comparison of spleen segmentation results on the DSDS: (a) Input CT slice, (b) Ground Truth (red boundary), (c) UPerNet prediction, (d) EMANet prediction, (e) CCNet prediction, (f) SegNet prediction, (g) ShuffleNet prediction, (h) SVNC-Net prediction.

The red contours represent the expert-annotated ground truth masks.

https://doi.org/10.1371/journal.pone.0332482.g006

Discussion

The results obtained from the experiments validate that SVNC-Net is a lightweight, efficient, and clinically applicable spleen segmentation model from 3D CT scans. To the best of our knowledge, no prior model has been specifically designed to achieve such an optimal balance between segmentation accuracy and computational efficiency. SVNC-Net’s ability to maintain high segmentation performance while minimizing computational costs establishes its suitability for real-world clinical deployment.

Trade-off between lightweight and continuity

SVNC-Net presents an optimal trade-off between segmentation accuracy and computational efficiency. Its lightweight structure ensures ease of deployment, while its low inference time and memory consumption make it ideal for real-time applications. The combination of high accuracy and computational efficiency highlights its potential for spleen segmentation in clinical settings, where timely and reliable segmentation is essential for diagnostic and treatment planning. Given these advantages, SVNC-Net represents a significant advancement in spleen segmentation from CT scans, providing a robust solution that meets the demands of modern clinical workflows.

SVNC-Net operates exclusively on 2D slices, processing 3D data by converting it into individual 2D sections. This approach inherently leads to the potential loss of spatial contextual information across the x, y, and z axes. Since 3D structures rely on inter-slice dependencies, this constraint could affect the ability to fully capture volumetric relationships. However, despite this theoretical limitation, the model remains highly effective in 2D segmentation tasks. Its primary focus is on extracting features from individual slices, and its performance does not show a significant decline due to the absence of explicit 3D spatial reasoning. The architecture of SVNC-Net is well-optimized for its intended application, demonstrating strong segmentation capabilities even without direct volumetric awareness.

One key area where the lack of 3D context could affect performance is in object boundary definition. In cases where boundaries are ambiguous, additional spatial cues from adjacent slices could enhance segmentation accuracy. Without such information, SVNC-Net might face challenges in distinguishing closely positioned or overlapping structures. However, the lightweight and simplistic architecture of the model minimizes the impact of this limitation. Unlike models that rely on highly rigid feature learning, SVNC-Net maintains a more flexible representation, reducing the risk of structural overfitting. This characteristic allows the model to be less sensitive to the absence of 3D contextual information, mitigating the potential drawbacks of its 2D-based approach.

Therefore, while the absence of direct 3D spatial reasoning introduces constraints, SVNC-Net continues to deliver robust performance in 2D segmentation tasks. The balance between computational efficiency and segmentation accuracy remains favorable, making it well-suited for applications where computational cost is prioritized over full volumetric understanding.

Post-hoc compression and clinical feasibility

In addition to its native efficiency, SVNC-Net benefits further from post-hoc compression techniques. As discussed earlier, pruning and 8-bit PTQ were applied to reduce model size, latency, and memory usage, with minimal impact on segmentation accuracy. These methods offer practical gains in edge-based clinical deployment, where hardware constraints are critical.

Notably, weight pruning was selectively applied to lightweight models such as ShuffleNet and SVNC-Net, rather than filter-based pruning. This decision stems from the architectural sensitivity of such models: aggressive structural pruning may distort spatial filters due to their already minimal redundancy.

The compression results can be summarized as follows:

  • The memory usage of SVNC-Net was reduced by 25% through quantization (from 4MB to 3MB), with only a 3.4% drop in IoU.
  • Inference time decreased from 34.4 ms to 30 ms via quantization, enhancing real-time capability.
  • Parameter count was reduced by 10% through weight pruning, with only a 0.9% performance drop.

These outcomes validate that SVNC-Net not only performs well in its original configuration but is also amenable to further compression without violating performance thresholds essential for clinical reliability.

Clinical implications of segmentation accuracy

While computational efficiency remains a primary motivation for the development of SVNC-Net, its segmentation accuracy also holds meaningful clinical relevance. In clinical settings, even marginal improvements in segmentation accuracy can lead to more reliable estimations of organ volume, which are crucial for diagnosing conditions such as splenomegaly. Particularly, in borderline cases where the spleen volume is close to the diagnostic threshold, a slight increase in accuracy metrics, such as or , may reduce the risk of misdiagnosis or unnecessary follow-up procedures.

Moreover, SVNC-Net consistently demonstrated high segmentation performance across two distinct datasets, despite their differences in imaging protocols and population characteristics. This stability may suggest a high degree of robustness and generalizability, which are essential for deployment in real-world clinical environments where variations in scan quality, acquisition protocols, and patient anatomy are frequently encountered.

Although some baseline models demonstrated comparable segmentation accuracies, SVNC-Net achieved this performance with significantly lower memory usage, faster inference, and greater robustness to compression. Importantly, this balance of accuracy, efficiency, and resilience is particularly valuable for deployment on edge devices in point-of-care settings, where limited computational resources coincide with the need for reliable and timely predictions. By maintaining competitive accuracy while significantly reducing memory usage and inference time, SVNC-Net offers a clinically viable solution for automated spleen segmentation in time-sensitive and resource-constrained clinical workflows.

Conclusion

This study proposes SVNC-Net, which is a lightweight 2D CNN-based model for efficient 3D spleen segmentation from CT scans. SVNC-Net employs depthwise separable convolutions to enhance computational efficiency while maintaining high segmentation accuracy. Evaluated on the MSD and DSDS, SVNC-Net achieved competitive segmentation accuracy, with an IoU of 0.89 and DSC of 0.94 on MSD, and an IoU of 0.84 and DSC of 0.92 on DSDS. Additionally, SVNC-Net outperformed well-known CNN-based models in computational efficiency, with only 1M parameters, the lowest inference time (34.4 ms), and minimal memory usage (4 MB). With a training time of 29 minutes, SVNC-Net is highly suitable for real-time clinical applications, offering a promising approach for lightweight 2D deep learning models in 3D organ segmentation. Furthermore, post hoc compression results revealed that quantization reduced memory usage by 25% (from 4 MB to 3 MB) and inference time by 12.8%, with only a 3.4% drop in IoU, demonstrating the model’s robustness under hardware friendly optimizations.

In conclusion, this study demonstrates that SVNC-Net is a computationally efficient and highly accurate model for 3D spleen segmentation from CT scans. Using a lightweight 2D architecture, it significantly reduces the computational burden while maintaining segmentation accuracy comparable to larger models. The higher performance of SVNC-Net in terms of inference speed and memory usage highlights its potential for integration into clinical workflows, particularly in resource-constrained environments. In addition, the successful application of pruning and quantization techniques further demonstrates the potential of SVNC-Net for real-time deployment on edge medical devices. Future research can explore its application to other organ segmentation tasks and further optimize its architecture for broader medical imaging applications.

References

  1. 1. Mebius RE, Kraal G. Structure and function of the spleen. Nat Rev Immunol. 2005;5(8):606–16. pmid:16056254
  2. 2. Pozo AL, Godfrey EM, Bowles KM. Splenomegaly: investigation, diagnosis and management. Blood Rev. 2009;23(3):105–11. pmid:19062140
  3. 3. Curovic Rotbain E, Lund Hansen D, Schaffalitzky de Muckadell O, Wibrand F, Meldgaard Lund A, Frederiksen H. Splenomegaly - Diagnostic validity, work-up, and underlying causes. PLoS One. 2017;12(11):e0186674. pmid:29135986
  4. 4. Chapman J, Goyal A, Azevedo AM. Splenomegaly. Treasure Island (FL): StatPearls Publishing. 2025.
  5. 5. Robertson F, Leander P, Ekberg O. Radiology of the spleen. Eur Radiol. 2001;11(1):80–95. pmid:11194923
  6. 6. Lamb PM, Lund A, Kanagasabay RR, Martin A, Webb JAW, Reznek RH. Spleen size: how well do linear ultrasound measurements correlate with three-dimensional CT volume assessments? Br J Radiol. 2002;75(895):573–7. pmid:12145129
  7. 7. Huo Y, Xu Z, Bao S, Bermudez C, Moon H, Parvathaneni P, et al. Splenomegaly Segmentation on Multi-Modal MRI Using Deep Convolutional Networks. IEEE Trans Med Imaging. 2019;38(5):1185–96. pmid:30442602
  8. 8. Caglar V, Alkoc OA, Uygur R, Serdaroglu O, Ozen OA. Determination of normal splenic volume in relation to age, gender and body habitus: a stereological study on computed tomography. Folia Morphol (Warsz). 2014;73(3):331–8. pmid:25465038
  9. 9. Linguraru MG, Sandberg JK, Li Z, Shah F, Summers RM. Automated segmentation and quantification of liver and spleen from CT images using normalized probabilistic atlases and enhancement estimation. Med Phys. 2010;37(2):771–83. pmid:20229887
  10. 10. Wood A, Soroushmehr SMR, Farzaneh N, Fessell D, Ward KR, Gryak J, et al. Fully Automated Spleen Localization And Segmentation Using Machine Learning And 3D Active Contours. Annu Int Conf IEEE Eng Med Biol Soc. 2018;2018:53–6. pmid:30440339
  11. 11. Sykes J. Reflections on the current status of commercial automated segmentation systems in clinical practice. J Med Radiat Sci. 2014;61(3):131–4. pmid:26229648
  12. 12. Mall PK, Singh PK, Srivastav S, Narayan V, Paprzycki M, Jaworska T, et al. A comprehensive review of deep neural networks for medical image processing: Recent developments and future opportunities. Healthcare Analytics. 2023;4:100216.
  13. 13. Roth HR, Oda H, Zhou X, Shimizu N, Yang Y, Hayashi Y, et al. An application of cascaded 3D fully convolutional networks for medical image segmentation. Comput Med Imaging Graph. 2018;66:90–9. pmid:29573583
  14. 14. Gibson E, Giganti F, Hu Y, Bonmati E, Bandula S, Gurusamy K, et al. Automatic Multi-Organ Segmentation on Abdominal CT With Dense V-Networks. IEEE Trans Med Imaging. 2018;37(8):1822–34. pmid:29994628
  15. 15. Huo Y, Xu Z, Bao S, Bermudez C, Plassard AJ, Yao Y, et al. Splenomegaly segmentation using global convolutional kernels and conditional generative adversarial networks. In: Medical Imaging 2018: Image Processing, 2018. 8.
  16. 16. Huo Y, Xu Z, Moon H, Bao S, Assad A, Moyo TK, et al. SynSeg-Net: Synthetic Segmentation Without Target Modality Ground Truth. IEEE Trans Med Imaging. 2018. pmid:30334788
  17. 17. Sandfort V, Yan K, Pickhardt PJ, Summers RM. Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation tasks. Sci Rep. 2019;9(1):16884. pmid:31729403
  18. 18. Su TY, Fang YH. Automatic liver and spleen segmentation with CT images using multi-channel U-net deep learning approach. In: Lin KP, Magjarevic R, De Carvalho P, editors. Future trends in biomedical and health informatics and cybersecurity in medical devices. Cham: Springer International Publishing. 2020:33–41.
  19. 19. Humpire-Mamani GE, Bukala J, Scholten ET, Prokop M, van Ginneken B, Jacobs C. Fully Automatic Volume Measurement of the Spleen at CT Using Deep Learning. Radiol Artif Intell. 2020;2(4):e190102. pmid:33937830
  20. 20. Meddeb A, Kossen T, Bressem KK, Hamm B, Nagel SN. Evaluation of a Deep Learning Algorithm for Automated Spleen Segmentation in Patients with Conditions Directly or Indirectly Affecting the Spleen. Tomography. 2021;7(4):950–60. pmid:34941650
  21. 21. Yang Y, Tang Y, Gao R, Bao S, Huo Y, McKenna MT, et al. Validation and estimation of spleen volume via computer-assisted segmentation on clinically acquired CT scans. J Med Imaging (Bellingham). 2021;8(1):014004. pmid:33634205
  22. 22. Ramkumar G, Prabu RT, Anitha G, Mohanavel V, Tamilselvi M. Fully automated spleen segmentation in patients using convolutional neural network on CT images. In: 2022 Sixth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud), 2022.
  23. 23. Lee S, Elton DC, Yang AH, Koh C, Kleiner DE, Lubner MG, et al. Fully Automated and Explainable Liver Segmental Volume Ratio and Spleen Segmentation at CT for Diagnosing Cirrhosis. Radiol Artif Intell. 2022;4(5):e210268. pmid:36204530
  24. 24. Perez AA, Noe-Kim V, Lubner MG, Somsen D, Garrett JW, Summers RM, et al. Automated Deep Learning Artificial Intelligence Tool for Spleen Segmentation on CT: Defining Volume-Based Thresholds for Splenomegaly. AJR Am J Roentgenol. 2023;221(5):611–9. pmid:37377359
  25. 25. Somasundaram E, Taylor Z, Alves VV, Qiu L, Fortson BL, Mahalingam N, et al. Deep Learning Models for Abdominal CT Organ Segmentation in Children: Development and Validation in Internal and Heterogeneous Public Datasets. AJR Am J Roentgenol. 2024;223(1):e2430931. pmid:38691411
  26. 26. Moon H, Huo Y, Abramson RG, Peters RA, Assad A, Moyo TK, et al. Acceleration of spleen segmentation with end-to-end deep learning method and automated pipeline. Comput Biol Med. 2019;107:109–17. pmid:30798219
  27. 27. Zettler N, Mastmeyer A. Comparison of 2D vs. 3D U-net organ segmentation in abdominal 3D CT images. In: Proceedings: 29 International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision, 2021:41–50.
  28. 28. Yuan Z, Puyol-Anton E, Jogeesvaran H, Inusa B, King AP. Deep Learning Framework for Spleen Volume Estimation from 2D Cross-sectional Views. arXiv; 2023.
  29. 29. Xiao T, Liu Y, Zhou B, Jiang Y, Sun J. Unified perceptual parsing for scene understanding. In: Proceedings of the European conference on computer vision (ECCV), 2018. 418–34.
  30. 30. Li X, Zhong Z, Wu J, Yang Y, Lin Z, Liu H. Expectation-maximization attention networks for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019. 9167–76.
  31. 31. Huang Z, Wang X, Huang L, Huang C, Wei Y, Liu W. Ccnet: Criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019. 603–12.
  32. 32. Badrinarayanan V, Kendall A, Cipolla R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans Pattern Anal Mach Intell. 2017;39(12):2481–95. pmid:28060704
  33. 33. Zhang X, Zhou X, Lin M, Sun J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. 6848–56.
  34. 34. Antonelli M, Reinke A, Bakas S, Farahani K, Kopp-Schneider A, Landman BA, et al. The Medical Segmentation Decathlon. Nat Commun. 2022;13(1):4128. pmid:35840566
  35. 35. Wang Y, Macdonald JA, Morgan KR, Hom D, Cubberley S, Sollace K. Duke spleen data set: A publicly available spleen MRI and CT dataset for training segmentation. arXiv. 2023.
  36. 36. Yu Q, Xia Y, Xie L, Fishman EK, Yuille AL. Thickened 2D Networks for Efficient 3D Medical Image Segmentation. 2019.
  37. 37. Ushinsky A, Bardis M, Glavis-Bloom J, Uchio E, Chantaduly C, Nguyentat M, et al. A 3D-2D Hybrid U-Net Convolutional Neural Network Approach to Prostate Organ Segmentation of Multiparametric MRI. AJR Am J Roentgenol. 2021;216(1):111–6. pmid:32812797
  38. 38. Xia Y, Xie L, Liu F, Zhu Z, Fishman EK, Yuille AL. Bridging the Gap Between 2D and 3D Organ Segmentation with Volumetric Fusion Net. Lecture Notes in Computer Science. Springer International Publishing. 2018:445–53.
  39. 39. Chen W, Gong X, Liu X, Zhang Q, Li Y, Wang Z. FasterSeg: Searching for Faster Real-time Semantic Segmentation. 2020.
  40. 40. Han S, Mao H, Dally WJ. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. 2016.
  41. 41. Zhang R, Chung ACS. EfficientQ: An efficient and accurate post-training neural network quantization method for medical image segmentation. Med Image Anal. 2024;97:103277. pmid:39094461
  42. 42. Xu X, Lu Q, Yang L, Hu S, Chen D, Hu Y, et al. Quantization of Fully Convolutional Networks for Accurate Biomedical Image Segmentation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. 8300–8.
  43. 43. Agarwal M, Agarwal S, Saba L, Chabert GL, Gupta S, Carriero A, et al. Eight pruning deep learning models for low storage and high-speed COVID-19 computed tomography lung segmentation and heatmap-based lesion localization: A multicenter study using COVLIAS 2.0. Comput Biol Med. 2022;146:105571. pmid:35751196
  44. 44. Azad R, Aghdam EK, Rauland A, Jia Y, Avval AH, Bozorgpour A. Medical image segmentation review: The success of u-net. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2024.
  45. 45. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. 2017.
  46. 46. Ruiping Y, Kun L, Shaohua X, Jian Y, Zhen Z. ViT-UperNet: a hybrid vision transformer with unified-perceptual-parsing network for medical image segmentation. Complex Intell Syst. 2024;10(3):3819–31.
  47. 47. Yin J, Chen Y, Li C, Zheng Z, Gu Y, Zhou J. Swin-TransUper: Swin Transformer-based UperNet for medical image segmentation. Multimed Tools Appl. 2024;83(42):89817–36.
  48. 48. Wang R, Chen X, Zhang X, He P, Ma J, Cui H, et al. Automatic segmentation of esophageal cancer, metastatic lymph nodes and their adjacent structures in CTA images based on the UperNet Swin network. Cancer Med. 2024;13(18):e70188. pmid:39300922
  49. 49. Rajamani ST, Rajamani K, Schuller BW. A novel and simple approach to regularise attention frameworks and its efficacy in segmentation. Annu Int Conf IEEE Eng Med Biol Soc. 2023;2023:1–4. pmid:38082715
  50. 50. Tang YB, Tang YX, Xiao J, Summers RM. Xlsor: A robust and accurate lung segmentor on chest x-rays using criss-cross attention and customized radiorealistic abnormalities generation. In: International Conference on Medical Imaging with Deep Learning, 2019. 457–67.
  51. 51. Sharma GK, Kumar S, Ranga V, Murmu MK. Artificial intelligence in cerebral stroke images classification and segmentation: A comprehensive study. Multimed Tools Appl. 2023;83(14):43539–75.
  52. 52. Zhou X, Wu G, Sun X, Hu P, Liu Y. Attention-Based Multi-Kernelized and Boundary-Aware Network for image semantic segmentation. Neurocomputing. 2024;597:127988.
  53. 53. Jiang Y, Liang J, Cheng T, Zhang Y, Lin X, Dong J. MCPANet: Multiscale Cross-Position Attention Network for Retinal Vessel Image Segmentation. Symmetry. 2022;14(7):1357.
  54. 54. Liu Y, Fu W, Selvakumaran V, Phelan M, Segars WP, Samei E. Deep learning of 3D computed tomography (CT) images for organ segmentation using 2D multi-channel SegNet model. In: Medical Imaging 2019: Imaging Informatics for Healthcare, Research, and Applications, 2019. 319–26.
  55. 55. Şahin N, Alpaslan N, Hanbay D. Robust optimization of SegNet hyperparameters for skin lesion segmentation. Multimed Tools Appl. 2021;81(25):36031–51.
  56. 56. Dabass M, Dabass J. An Atrous Convolved Hybrid Seg-Net Model with residual and attention mechanism for gland detection and segmentation in histopathological images. Comput Biol Med. 2023;155:106690. pmid:36827788
  57. 57. Ullah N, Khan JA, El-Sappagh S, El-Rashidy N, Khan MS. A Holistic Approach to Identify and Classify COVID-19 from Chest Radiographs, ECG, and CT-Scan Images Using ShuffleNet Convolutional Neural Network. Diagnostics (Basel). 2023;13(1):162. pmid:36611454
  58. 58. Al-Fahsi RDH, Aqthobirrobbany A, Ardiyanto I, Nugroho HA. MSGNet: Modified MobileNet-ShuffleNet-GhostNet Network for Lightweight Retinal Vessel Segmentation. In: 2023 10th International Conference on Information Technology, Computer, and Electrical Engineering (ICITACEE), 2023:94–9.
  59. 59. Loshchilov I, Hutter F. SGDR: Stochastic Gradient Descent with Warm Restarts. 2017.
  60. 60. Isensee F, Petersen J, Klein A, Zimmerer D, Jaeger PF, Kohl S. nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation. arXiv. 2018. http://arxiv.org/abs/1809.10486
  61. 61. Xue X, Liang D, Wang K, Gao J, Ding J, Zhou F, et al. A deep learning-based 3D Prompt-nnUnet model for automatic segmentation in brachytherapy of postoperative endometrial carcinoma. J Appl Clin Med Phys. 2024;25(7):e14371. pmid:38682540
  62. 62. Huang L, Miron A, Hone K, Li Y. Segmenting Medical Images: From UNet to Res-UNet and nnUNet. In: 2024 IEEE 37th International Symposium on Computer-Based Medical Systems (CBMS), 2024:483–9.
  63. 63. Han S, Pool J, Tran J, Dally W. Learning both weights and connections for efficient neural network. Adv Neural Inf Process Syst. 2015;28.
  64. 64. Pruning Filters D. Filters’ Importance D. Pruning Filters for Efficient ConvNets. http://asim.ai/papers/ecnn_poster.pdf. Accessed 2025 May 6.
  65. 65. Jacob B, Kligys S, Chen B, Zhu M, Tang M, Howard A, et al. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018:2704–13.
  66. 66. Jaccard P. Distribution de la flore alpine dans le bassin des Dranses et dans quelques regions voisines. Bull Soc Vaudoise Sci Nat. 1901;37:241–72.
  67. 67. Cao L, Guo Y, Yuan Y, Jin Q. Prototype as query for few shot semantic segmentation. Complex Intell Syst. 2024;10(5):7265–78.
  68. 68. Dice LR. Measures of the Amount of Ecologic Association Between Species. Ecology. 1945;26(3):297–302.