Enhanced gallbladder cancer detection via active and self-supervised learning integration: Innovating B-ultrasound image analysis

Jia Li; Yu-Qian Zhou

doi:10.1371/journal.pone.0330781

Abstract

Gallbladder cancer, a common yet often under diagnosed malignancy, is typically characterized by late detection and a poor prognosis. The rise of deep learning has introduced new methods for its early identification through B-ultrasound imaging, but there are still challenges of inefficient data labeling and feature extraction. This paper introduces a novel classification algorithm, ASGBC, intended to tackle related challenges in diagnosing gallbladder cancer using B-ultrasound images. Firstly, we combine active learning with self-supervised learning to decrease the reliance on labeled data. Secondly, we introduce the MsHop module, which effectively captures the fine textures and patterns in ultrasound images through the integration of multi-scale and high-order information, thereby improving diagnostic accuracy. Additionally, we develop a dual-branch loss function that leverages data correlation and clustering features to enhance feature extraction and model stability. The experiments on a gallbladder ultrasound dataset have confirmed the effectiveness of our algorithm, achieving an accuracy of 0.884, a specificity of 0.932, and a sensitivity of 0.912—outperforming existing methods. The results exhibit lower variance, indicating improved model stability. Furthermore, the findings demonstrate that using active learning, one can achieve comparable results to those from the full dataset with only 35% of the data, reducing annotation costs and increasing model learning efficiency. Further research will concentrate on refining the algorithm for wider clinical use and identifying additional features that may further improve diagnostic accuracy.

Citation: Li J, Zhou Y-Q (2025) Enhanced gallbladder cancer detection via active and self-supervised learning integration: Innovating B-ultrasound image analysis. PLoS One 20(9): e0330781. https://doi.org/10.1371/journal.pone.0330781

Editor: Li Yang, Sichuan University, CHINA

Received: March 14, 2025; Accepted: August 5, 2025; Published: September 16, 2025

Copyright: © 2025 Li, Zhou. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All GBCU files are available from the GBCU database (https://gbc-iitd.github.io/data/gbcu).

Funding: This work was supported by the Scientific Research Start-Up Project of CUIT (KYTZ202120 to JL), Sichuan Science and Technology Program (2023ZYD00011 to YQ Z), Key Project of the Open Fund of the Sichuan National Applied Mathematics Center (2024-KDJJ-02-001 to YQ Z), and Key Project of the Open Fund of the Sichuan National Applied Mathematics Center (2025-KFJJ-02-001 to YQ Z).

Competing interests: The authors have declared that no competing interests exist.

Introduction

Gallbladder cancer (GBC) is a highly malignant tumor with a very poor prognosis, posing a significant threat to global public health. Although bile duct malignancies are relatively rare, GBC is the most common and aggressive among them, particularly affecting women [1]. Ranked 22nd among all common cancers, the incidence rate of GBC is higher in women (20th) compared to men (23rd) [2]. It is the sixth leading cause of cancer-related deaths among various malignancies. According to the GLOBOCAN 2020 report, in 2020, there were 115,949 new cases and 84,695 deaths worldwide, encompassing all ages and genders. Asia has the highest incidence and mortality rates, accounting for 71.9% and 75.0% of the global figures, respectively [3]. Therefore, early detection is crucial for effective treatment and improving patient survival rates.

Given the anatomical location of GBC and its asymptomatic nature or symptoms that mimic other diseases, early detection is challenging. It is often discovered incidentally after gallbladder removal surgery for other indications [4]. A study conducted in 2021 showed that less than 15% of GBC cases diagnosed in the United States between 2013 and 2017 were at a localized stage. The majority were diagnosed at later stages, with 38.7% at the regional stage and 44.4% at the distant stage [5].

Ultrasound is the primary diagnostic tool for gallbladder diseases due to its safety, cost-effectiveness, and ease of use. It is commonly used for the initial assessment of suspected gallbladder diseases. In resource-limited countries, it is often the only imaging examination available to patients with abdominal diseases. However, ultrasound images can be compromised by noise artifacts, such as speckle noise, which degrades image quality—a problem not commonly found in CT, MRI, PET, and SPECT imaging modalities. Detecting malignant gallbladders is more challenging due to the lack of clear boundaries or morphological features compared to normal and benign gallbladder areas. Therefore, the accuracy of ultrasound diagnosis largely depends on the experience of the sonographer and the diagnosing physician [6]. While certain characteristics aid in identifying GBC on ultrasound images, differentiating it from other abnormalities in the early stages remains difficult [7]. Thus, analyzing ultrasound images to establish and understand the characteristics of malignant gallbladder tumors is essential for enhancing the recognition and differentiation of GBC, ultimately preventing under-treatment and over-treatment.

The evolution of deep learning has opened new avenues for medical diagnostics using ultrasound images. A plethora of researchers has applied deep learning techniques to analyze ultrasound images across different organs, thereby assisting in medical diagnosis. This encompasses a range of conditions including breast tumors [8,9], prostate nodules [10,11], thyroid nodules [12,13], ocular diseases [14,15], pulmonary conditions [16,17], and fetal imaging [18,19]. However, compared with CT and MRI, the studies based on B-ultrasound images is much less, and there is even less about GBC.

Self-supervised learning(SSL) and active learning(AL) hold significant potential in medical image analysis, enhancing the model’s learning capability and diagnostic accuracy through intelligent data selection and utilization of the data’s inherent structure [20,21]. SSL trains models by predicting transformations or attributes of the data without external labeling, making full use of unlabeled image data [22–24]. AL allows models to identify and request labeling for the most uncertain samples, optimizing the learning process with limited expert resources [25,26]. These methods can strengthen the model’s ability to recognize subtle features in medical images and improve generalization to different lesion types, which is crucial for increasing diagnostic efficiency and accuracy. But the majority of existing studies have relied on supervised learning methodologies, necessitating the use of labeled datasets.

At present, there is no research on using AL in B-ultrasound images. A smaller subset of research has delved into SSL algorithms to pre-train models for extracting features from B-mode ultrasound images. Researchers Nguyen et al. [27] assessed the efficacy of the BYOL algorithm [28] for classifying breast ultrasound images using a public dataset of breast expert data. Mishra et al. [29] pre-trained an encoder-decoder architecture to perform deterministic edge detection or segmentation tasks without the need for machine learning. Experiments conducted on two public datasets demonstrated that SSL enhances performance, particularly when there is a scarcity of labeled training data. Zhao and Yang [30] utilized the public TN-SCUI2020 dataset to preprocess classifiers for distinguishing between benign and malignant thyroid nodules. Researchers Jiao et al. [31] and Chen et al. [32] applied Self-Supervised Learning to obstetric-related ultrasound image analysis tasks. Liu et al. [33] pre-trained an encoder-decoder model for downstream tasks of classifying gastrointestinal stromal tumors from endoscopic ultrasound images.

It can be found that using deep learning models to analyze ultrasound images poses a major challenge. Firstly, unlike MRI or CT, ultrasound images tend to have lower image quality and are susceptible to noise and sensor artifacts, existing feature extractors, which are primarily designed for natural images, are more likely to learn from false textures and fail to truly capture the characteristics of GBC. Moreover, most existing classification algorithms for GBC rely on fully supervised learning, requiring all data to be labeled. Algorithms proposed by Basu et al. [34,35] are based on unsupervised and self-supervised learning, but they all necessitate the use of B-ultrasound video information, leading to high training resource demands. The integration of AL and SSL offers a promising yet unexplored path. To reduce training costs while enhancing the accuracy of diagnosing GBC with B-ultrasound images, this paper introduces a classification algorithm for GBC that combines AL and SSL, called ASGBC. The specific contributions are as follows:

Integration of AL and SSL: We implement AL prior to SSL to preselect data with high information value. This proactive selection reduces the training time for the classification model, decreases the demand for computational resources, and lowers deployment costs.
Design of the MsHop Module: By extracting multi-scale and high-order information from images simultaneously, it comprehensively encodes tumor characteristics, ensuring a detailed and accurate feature representation.
Design of a Dual-Branch Loss Function: It considers both data correlation and clustering features, making feature extraction more refined and robust, thereby improving the model’s predictive accuracy and stability.

Related works

Deep learning applications for gallbladder cancer diagnosis. With the development of deep learning technology, its application in medical image analysis has provided new perspectives for improving the accuracy and efficiency of diagnosing GBC. Recent studies have introduced several innovative methods in this domain. Lian et al. [36] presented an automatic segmentation method for gallbladder and gallstone regions in ultrasound images, integrating an improved Otsu algorithm, anisotropic diffusion, global morphology filtering, a parameter-adaptive pulse-coupled neural network (PA-PCNN), and locally weighted regression smoothing (LOESS) for enhanced accuracy and efficiency. Jeong et al. [37] developed a deep learning-based decision support system (DL-DSS) that significantly improved the performance of gallbladder polyp diagnosis on ultrasound through transfer learning, demonstrating that the diagnostic performance assisted by DL-DSS was superior to that of individual radiologists. Kim et al. [38] enhanced the classification accuracy of gallbladder polyps less than 20 millimeters by using an ensemble convolutional neural network model, showing the potential of deep learning to improve clinical diagnostic specificity. Basu et al. [39] developed GBCNet, a CNN-based model that excels in GBC detection from ultrasound images. It overcomes challenges of low image quality and spurious textures through a novel ROI extraction method, a multi-scale second-order pooling architecture, and a curriculum inspired by human visual acuity, outperforming both state-of-the-art models and expert radiologists. Basu et al. [34] also introduced an innovative unsupervised contrastive learning (UCL) framework that uses hard negatives from temporally distant frames within the same ultrasound video, along with a hardness-sensitive negative mining curriculum, to enhance image representation learning for gallbladder malignancy detection, achieving higher accuracy than state-of-the-art techniques. Shuvo and Chowdhury [40] proposed a method for GBC classification using an ensemble of well-known convolutional neural network models, significantly enhancing classification accuracy.

Active learning. Existing AL approaches can be divided into two main groups: distribution-based and uncertainty-based methods. AL has emerged as a pivotal paradigm for enhancing the efficiency of model training by selectively labeling informative data points. Sener and Savarese [41] redefined AL as core-set selection, focusing on choosing a diverse subset of data. Pinsler et al. [42] offered a Bayesian batch AL method that approximates the posterior for model parameters, enabling scalable AL. Sinha et al. [43] introduced Variational Adversarial Active Learning (VAAL), leveraging a VAE and adversarial network for representation learning. Xie et al. [44] proposed Energy-based Active Domain Adaptation (EADA), utilizing energy-based models to reduce domain gaps. Cabannes et al. [45] presented Positive Active Learning (PAL), a framework that integrates SSL with AL by querying semantic relationships. These works collectively advance the field by addressing challenges in labeling efficiency, representation learning, and scalability in various learning scenarios.

Self-supervised learning. SSL has made significant strides in recent years, with various innovative approaches proposed to learn meaningful representations without labeled data. The introduction of frameworks like SimCLR [46], MoCo [47], and BYOL [28] has revolutionized the field by simplifying the learning process and enhancing the quality of learned representations. These methods focus on maximizing the similarity between augmented versions of the same image, effectively learning to discriminate between different instances. Additionally, SwAV [48] introduced an online clustering approach that contrasts cluster assignments, further improving the efficiency and scalability of SSL. Barlow Twins [49] presented a redundancy reduction principle to ensure informative yet invariant representations. Innovative works by He et al. [50], Chen et al. [51], and Bao et al. [52] significantly advanced the state-of-the-art by introducing masked autoencoders (MAE) and their scalable variants, demonstrating the efficacy of reconstructing masked image patches for learning meaningful representations. The seminal paper by Huang et al. [53] on Vision Transformers has further catalyzed research in this domain, showing that self-supervised methods can be effectively scaled up with the right architectural choices. The recent breakthrough by Mishra et al. [54] presented a simple yet efficient contrastive masked autoencoder, highlighting the complementary nature of contrastive learning and MAE. Furthermore, the work of Oquab et al. [55] on DINOv2 underscores the potential of SSL to produce versatile visual features that are competitive with weakly-supervised models across diverse tasks. Collectively, these works have pushed the boundaries of unsupervised visual representation learning, achieving competitive results with supervised counterparts and opening new avenues for research in computer vision.

In summarizing the related work on diagnosing GBC on B-ultrasound images using deep learning, the majority of explorations have been based on deep learning schemes that rely solely on labeled data. While some methods have indeed harnessed the information stored in unlabeled data through SSL [34,35], they all necessitate the use of video-level data rather than individual B-ultrasound images. Furthermore, no methods have addressed the reduction of labeling costs and computational resources through the use of AL. In this paper, we combine AL with SSL, effectively saving on training-related costs and enhancing the model’s accuracy and robustness by improving the feature extraction network and loss function within SSL.

Method

Overview of the framework

Our proposed approach ASGBC combines AL and SSL to minimize training expenses and the labor of labeling, while enhancing the model’s precision and stability through the integration of multi-scale, high-order information, as well as features that account for image correlation and clustering. The flowchart of our algorithm is illustrated as shown in Fig 1.

Download:

Fig 1. Framework of the proposed algorithm.

(The proposed method, ASGBC, integrates AL in the first phase and SSL in the second phase. This integration helps to reduce the required training resources and labeling effort to less than 35% of the dataset.)

https://doi.org/10.1371/journal.pone.0330781.g001

For details on the specific algorithmic process, please refer to Algorithm 1. Initially, AL is employed to train the data selection network, which selects the most informative samples X_S from the pool of unlabeled data X(Details can be found in section Following this, a feature extractor that incorporates multi-scale modules and high-order pooling modules is trained using the selected data X_S in conjunction with SSL techniques(For more information, please refer to section Get extractor). The features learned can then be applied to fine-tune various downstream tasks, including but not limited to classification, localization, and detection. For this paper, the downstream task focuses on the classification of GBC in B-ultrasound images. A classifier with a single linear layer is appended to the feature extraction backbone, with the feature extractor’s parameters frozen to retain the generalized features learned. The classifier is subsequently fine-tuned using a modest amount of labeled B-ultrasound data to achieve the final classification model.

Algorithm 1. ASGBC.

1: Input: Unlabeled dataset X

2: Output: Trained GBC classification model

3: Phase 1: Data Selection (Active Learning based on VAAL)

4: Randomly select 5% data from X to initialize as X_S, and .

5: for epoch =1 to epochs do

6: Sample , Sample

7: Compute and by using Eqs (1) and (2) respectively

(Eq (1): VAE reconstruction loss; Eq (2): adversarial loss for discriminator)

8: Compute by using Eq (4)

(Eq (4): Combined VAE loss integrating reconstruction and adversarial components)

9: Update VAE by descending stochastic gradients:

10: Compute L_D by using Eq (3)

(Eq (3): Discriminator loss for active learning)

11: Update D by descending its stochastic gradient:

12: Train and update T:

13: end for

14: Select samples (X_s) with

(Select samples with highest uncertainty from unlabeled pool)

15:

16:

17: Phase 2: Feature Extractor Training (Self-Supervised Learning)

18: for epoch =1 to epochs do

19: for batch =1 to batchs do

20: Sample a batch data

21: Calculate two different enhancements and respectively

22: Extract the features and using feature extractor

23: Compute embeddings and using expander

24: Compute encodings Q₁ and Q₂ using Eq (17)

25: Compute using Eq (11)

(Eq (11): Contrastive loss for feature invariance)

26: Compute using Eq (15)

(Eq (15): Clustering loss for feature discriminability)

27: Compute L_total using Eq (5)

(Eq (5): Combined loss integrating contrastive and clustering objectives)

28: The loss function L_total is minimized to train parameters θ and

29: end for

30: end for

31: Phase 3: Supervised Fine-tuning

32: Annotate the selected data X_S for classification

(Perform manual annotation on the selected samples)

33: Freeze the weights of

(Fix the pre-trained feature extractor parameters)

34: Fine-tune the linear classification layer with supervised training on X_S to obtain the GBC classification model

(Only update the final classification layer with labeled data)

35: return Trained GBC classification model

Data selection

This section explains how we utilize AL to select the most informative data. Although our method is compatible with any AL technique, we draw inspiration from the Variational Adversarial Autoencoder (VAAL) [43]. We employ a β-Variational Autoencoder (β-VAE) [56] and an adversarial network [57] to implicitly learn the sampling mechanism. The β-VAE is responsible for learning the latent space representation of the data, while the adversarial network discerns between labeled and unlabeled data. A minimax game is played between the β-VAE and the adversarial network, where the β-VAE aims to deceive the adversarial network into believing all data points originate from the labeled data pool. Concurrently, the adversarial network strives to differentiate between them within the latent space. For specific algorithm steps, refer to Algorithm 1 (Phase 1). This phase does not require any labels; instead, it is entirely based on the intrinsic characteristics of the data.

Among them, the transduction of VAE represents a transformational representation learning objective function is:

(1)

VAE’s antagonism represents the learning objective function is:

(2)

training objective function of countermeasure network is:

(3)

complete VAE objective function of VAAL is:

(4)

where is mathematical expectation; and are encoders and decoders, respectively, and are parameterized by parameters ϕ and θ; P(z) is the selected prior distribution, usually the unit Gaussian distribution; is the Lagrange multiplier of the optimization problem; indicates Kullback-Leibler divergence; D is the differentiator of the adversarival network; and are superparameters, which are used to determine the effect of each component in learning effective variational antagonism representation.

Get extractor

In this section, we will describe how to employ an enhanced SSL algorithm to train a feature extractor for gallbladder B-ultrasound images. The framework of this SSL algorithm is illustrated in Fig 1 (Phase 2). Similar to traditional contrastive learning methods, it features two branches for data augmentation, with distinctions in the following two aspects.

1. The feature extractor is specially designed, incorporating a multi-scale high-order feature extraction module(MsHop) into the Resnet backbone. This allows for the integration of multi-scale and high-order information from images, leading to more accurate extraction of GBC features.

2. A unique dual-branch loss function is designed, integrating the correlation and clustering features of the feature maps. It optimizes the model from multiple perspectives, enhancing the feature extraction capability while also strengthening the model’s stability.

Algorithm 1 (Phase 2) provides a detailed step-by-step guide for training a feature extractor using an improved self-supervised algorithm. Given a batch of images x_S, two different batches of views X₁ and X₂ are generated by transformations T₁ and T₂, respectively, and then encoded into representations Y₁ and Y₂ using a feature extractor . These representations are processed by two different branches, which handle feature encoding differently, resulting in two losses L_con and L_clu through intra-batch denormalization and exchange prediction, respectively. The final loss is a combination of these two losses:

(5)

where α is a hyperparameter.

We will now proceed to detail the MsHop module and the dual-branch loss function.

The core concept of multi-scale design is to establish hierarchical residual connections within a single residual block, representing multi-scale features and increasing the receptive field of each network layer. Higher-order pooling aims to use all three feature dimensions to learn a robust second-order covariance representation, thereby enhancing the accuracy of diagnosing GBC in ultrasound images. The MsHop module integrates these two approaches, as shown in Fig 2. This module takes into account both multi-scale information and the height, width, and channel dimensions to enhance the learned second-order statistical information.

Download:

Fig 2. Framework of MsHop.

https://doi.org/10.1371/journal.pone.0330781.g002

Firstly, we introduce the MsHop module. Its computational process is depicted in Fig 2

Assuming the input feature map F has a size of , it is divided into four subsets along the depth direction. Each subset F_i has a size of and is processed by a 3x3 convolution K_i, resulting in output S_i. Except for F₁, each F_i is added to the output S_i−1 of the previous filter bank and then input to the next filter bank . This process can be expressed as:

(6)

where represents the i-th group of convolutions. Finally, all S_i are merged (e.g., by concatenation) and fused by a 11 convolution , yielding the middle feature map :

(7)

Assume the feature map has size of , reduce the feature dimension to , , and by three 11 convolution layers respectively,

(8)

where , , . Then, compute the covariance matrices of the reduced features for each dimension , , and :

(9)

Three statistical weight vectors , , and are generated on each result covariance matrix using row-wise convolution, and these weight vectors are multiplied by the middle feather . The scaled feature maps in three dimensions are fused to generate the output feature map :

(10)

Then, we introduce the dual-branch loss function(According to Fig 1). The first branch sends Y₁ and Y₂ to an expander , resulting in embeddings Z₁ and Z₂ (the expander consists of three fully connected layers). Positive samples are different augmentations of the same image, and negative samples are different images in the same batch. The loss function is calculated using the following regularization terms:

(11)

where λ, μ, and are hyperparameters;

(12)

is the invariance criterion, n is the number of images in the batch ;

(13)

is the covariance regularization term, where C(Z) is the covariance matrix of Z;

(14)

is the variance regularization term, where d is the dimensionality of the embedding, S is the regularized standard deviation, defined as , and is a small scalar to prevent numerical instability.

The second branch adopts the idea of online clustering. Features Y₁ and Y₂ are assigned to prototype vectors C to obtain “codes" Q₁ and Q₂. The clustering loss is calculated by “exchanging" the prediction problem:

(15)

where is defined as:

(16)

with p(k) indicating the matching probability of prototype c_k and feature Y.

The optimization problem maximizes the similarity between features and prototypes while maintaining smooth coding:

(17)

where H(Q) is the entropy function of coding Q, and ε controls the smoothness of the mapping.

Experiments

In this section, we first compare our method with several CNN-based SSL methods proposed recent on GBCU datasets. Next, we perform an ablation study of the proposed model. Finally, we evaluate the impact of the AL module. All our experiments are tested under the Pytorch framework and run on a computer equipped with a GTX-3060 GPU and an i7-12700@2.10GHz CPU.

Dataset. The GBCU dataset, introduced by Basu et al. [39] (https://gbc-iid.github.io/data/gbcu, E-mail: soumen.basu@cse.iitd.ac.in), is a comprehensive collection of 1,255 ultrasound images. It includes 432 normal, 558 benign, and 265 malignant gallbladder cases, all derived from 218 patients. Among these 218 patients, 71, 100, and 47 belong to the normal, benign, and malignant categories, respectively. This meticulously annotated dataset features both image-level labels and bounding box annotations for malignant regions, which is pivotal for advancing gallbladder cancer (GBC) detection research. We report the cross-validation results from ten iterations on the entire dataset, which were used in key experiments to evaluate model generalization. To ensure generalization to unseen patients, during cross-validation, all images from any particular patient appear exclusively in either the training or validation set.

Experimental setting. We use the weights of ResNet50 pre-trained on the ImageNet1k dataset as the initial values for some weights in feature extraction, while other parameters are initialized using a normal distribution. The coefficient β is set to 1 in Eq (1), and are set to 1 in Eq (4), Coefficients α is set to 0.1 in Eq (5), and λ and μ are set to 25, and γ is set to 1 in Eq (11), ε is set to 0.0001 in Eq (17). We use the Adam optimizer to minimize our total loss and train the entire framework for 800 iterations. The batch size is 64, the initial learning rate is 0.003, the momentum is 0.9, the weight decay is , and the learning rate adjustment follows a cosine schedule.

Evaluation metric. During the experimental phase of our research, we meticulously assessed the performance of our proposed model through a comprehensive set of key evaluation metrics, including accuracy, Macro-F1 score, sensitivity , specificity and F1 score for the malignant class. Accuracy serves as the cornerstone of measurement standards, depicting the proportion of correct predictions made by the model across all samples. The Macro-F1 score, a simple average of the F1 scores across categories, is an important metric for assessing multi-class models. It takes into account the precision and recall of all classes, thereby reflecting the model’s overall diagnostic capability. We paid particular attention to the identification of the malignant category, aiming to reduce both missed diagnoses and misdiagnosis at the same time. Consequently, we also monitored the sensitivity, specificity, and F1 score for the malignant class to rigorously evaluate the model’s diagnostic performance for this category. By employing these metrics, we strive to conduct a thorough assessment of the model’s effectiveness, ensuring a consistent and reliable evaluation of our algorithm’s capabilities in classifying GBC.

Comparison experiment

In this section, we compare the performance of our method with several recent algorithms, including SwAV [48], Barlow Twins [49], VicReg [58], SimCLR [54], and the recently introduced GBCNet [34], which is specifically designed for gallbladder cancer detection. The comparison focuses on key diagnostic metrics, including accuracy, specificity, sensitivity, F1-score, and Macro-F1, as summarized in Table 1 and Fig 3.

Download:

Fig 3. Results of comparison experiment.

(Each group represents an evaluation metric, with different colors signifying different algorithms. The height of the bars indicates the magnitude of the metric, and the line segments on top of the bars represent the standard deviation of the metric. The taller the bar, the better the performance; the shorter the line segment, the more stable the model.)

https://doi.org/10.1371/journal.pone.0330781.g003

Download:

Table 1. Results of comparison experiment. (Sensitivity refers to the proportion of malignant cases that are correctly identified as such, indicating the detection rate of malignant tumors. Specificity refers to the proportion of non-malignant samples that are correctly identified as non-malignant, reflecting the model’s ability to recognize non-malignant samples and prevent misdiagnosis. The F1-score represents the F1 score for malignant classification. Bolded results indicate the best performance. CI_0.95 represents the 95% confidence interval,

represents the p value of paired t-test with 0.05 significance between ASGBC and other algorithms, and p<0.05 indicates that there is significant difference between the two algorithms in this evaluation index.)

https://doi.org/10.1371/journal.pone.0330781.t001

Our method, ASGBC, achieves the highest performance in most metrics. Specifically, it attains an accuracy of 0.884 ± 0.038 (95% CI: 0.857–0.911), outperforming all other methods with statistically significant differences (all p<0.05).

In terms of sensitivity, ASGBC reaches 0.912 ± 0.091 (95% CI: 0.847–0.977), significantly higher than SwAV, Barlow Twins, VicReg, and SimCLR (all p<0.01), indicating a superior ability to detect malignant cases. Although GBCNet achieves a slightly higher sensitivity 0.923 ± 0.071, the difference is not statistically significant (p>0.05). Notably, ASGBC achieves comparable performance using only 35% of the labeled data, highlighting its efficiency and practical value in clinical settings where annotated data are scarce.

In terms of specificity, ASGBC achieves 0.932 ± 0.047 (95% CI: 0.898–0.966), which is among the second highest, though not statistically different from other methods (p>0.05). This suggests that our method maintains a strong ability to correctly identify non-malignant cases, reducing the risk of misdiagnosis.

The F1-score and Macro-F1, which reflect the balance between precision and recall, further demonstrate the robustness of our method. ASGBC achieves an F1-score of 0.844 ± 0.101 (95% CI: 0.772–0.916) and a Macro-F1 of 0.865 ± 0.069 (95% CI: 0.816–0.914), both of which are the highest among all compared methods. The improvements over SwAV, Barlow Twins, VicReg, and SimCLR are statistically significant (all p<0.05). Again, while GBCNet performs slightly worse in these metrics, the difference is not statistically significant (p>0.05).

Moreover, the low standard deviations observed in ASGBC’s metrics (e.g., accuracy SD = 0.038, Macro-F1 SD = 0.069) indicate that our model provides stable and reliable predictions across different test folds. In contrast, methods like Barlow Twins and SimCLR exhibit higher variability, which may limit their clinical applicability.

These results suggest that our method not only achieves superior diagnostic accuracy but also maintains robustness and generalizability. The integration of multi-scale feature extraction and dual-branch loss optimization contributes to its strong performance. Furthermore, the reduced dependency on labeled data makes ASGBC a promising tool for real-world clinical deployment, especially in resource-limited settings.

Clinical relevance and human-AI collaboration

In addition to comparing our ASGBC model with other computational methods, it is essential to evaluate its performance in the context of real-world clinical applications, particularly in relation to human expert diagnostic accuracy. To this end, we reference the human baseline performance reported by Basu et al. [39], where two experienced radiologists independently evaluated the same dataset used in our experiments. As shown in Table 2, Radiologist A achieved an accuracy of 0.816, specificity of 0.873, and sensitivity of 0.707, while Radiologist B achieved an accuracy of 0.784, specificity of 0.911, and sensitivity of 0.732. In comparison, our ASGBC model significantly outperforms both radiologists across all three metrics, achieving an accuracy of 0.884, specificity of 0.932, and sensitivity of 0.912.

Download:

Table 2. Performance comparison between ASGBC and human radiologists.

https://doi.org/10.1371/journal.pone.0330781.t002

To further assess the diagnostic consistency between our model and human experts, we conducted a Kappa consistency analysis. Due to budget constraints, we were unable to organize a new blind test involving multiple radiologists. Instead, we referenced the blind test data from the original dataset publication, which included diagnostic results from two radiologists. Since individual diagnostic results for each image were not available, we employed a mathematical derivation to estimate the Kappa coefficient. The detailed derivation is provided in the Appendix.

As shown in Table 3, the Kappa coefficient ranges from 0.434 to 0.766 for Radiologist A and from 0.502 to 0.837 for Radiologist B. These results indicate that ASGBC demonstrates a stronger alignment with Radiologist B compared to Radiologist A. Importantly, the worst-case Kappa values for both radiologists exceed the minimum clinical threshold of 0.40, suggesting that the consistency is acceptable in both cases. Furthermore, the best-case Kappa values for both radiologists meet the diagnostic independence standards (>0.75), which implies that the consistency between the radiologists and ASGBC is strong and reliable. Overall, these findings highlight the robustness of ASGBC in achieving high consistency with human expert assessments. These results demonstrate that our model not only excels in computational benchmarks but also holds significant potential for enhancing clinical diagnostic accuracy. The high sensitivity of ASGBC (0.912) is particularly noteworthy, as it indicates a strong capability to correctly identify malignant cases, thereby reducing the risk of missed diagnoses—a critical factor in early-stage gallbladder cancer detection.

Download:

Table 3. Kappa consistency ranges.

https://doi.org/10.1371/journal.pone.0330781.t003

To further bridge the gap between AI performance and clinical utility, we envision the development of a real-time lesion prompting system that integrates the diagnostic strengths of ASGBC with the expertise of human clinicians. In practice, such a system would assist radiologists by providing rapid diagnostic suggestions and highlighting suspicious regions during ultrasound examinations. This collaborative approach would not only improve diagnostic speed and accuracy, especially in complex or early-stage cases, but also reduce the workload and cognitive burden on medical professionals.

Importantly, this human-AI partnership does not seek to replace clinicians but rather to augment their capabilities. By combining the computational efficiency and consistency of AI with the contextual judgment and experience of human experts, we aim to create a synergistic diagnostic workflow that maximizes the strengths of both. Ultimately, we believe that ASGBC can serve as a powerful clinical assistant, contributing to more accurate, efficient, and accessible medical diagnostics, and ultimately improving patient outcomes.

Noise robustness analysis

To evaluate the robustness of the proposed ASGBC model under noise interference, we conducted experiments by adding Gaussian noise with a kernel sigma of 5 to the B-ultrasound images. The performance metrics under different noise levels are presented in Table 4.

Download:

Table 4. Performance metrics under different noise levels.

https://doi.org/10.1371/journal.pone.0330781.t004

The results demonstrate that the ASGBC model maintains relatively stable performance under low noise levels. Specifically, with 1% Gaussian noise, the accuracy decreases by only 1.02%, specificity by 0.97%, sensitivity by 0.99%, F1-score by 0.95%, and Macro-F1 by 1.04%. As the noise level increases to 5%, the performance degradation becomes more pronounced, with accuracy decreasing by 2.49%, specificity by 2.47%, sensitivity by 2.63%, F1-score by 2.49%, and Macro-F1 by 2.54%. At 10% noise level, the model still retains reasonable performance, with accuracy at 0.859, specificity at 0.905, sensitivity at 0.885, F1-score at 0.820, and Macro-F1 at 0.840.

These findings indicate that the ASGBC model exhibits good robustness against Gaussian noise, which is crucial for clinical applications where ultrasound images may be affected by various noise artifacts. The model’s ability to maintain performance under noise interference highlights its potential for real-world deployment in clinical settings.

Ablation study

We use the VicReg [58] model with the best performance from the comparative experiment as the baseline. Its feature extractor is the backbone of ResNet, and the loss function only includes L_con. The results presented in the Table 5 and Fig 4 provide insights into the impact of the proposed enhancements on the baseline model for GBC classification using B-ultrasound images.

Download:

Fig 4. Results of ablation study.

https://doi.org/10.1371/journal.pone.0330781.g004

Download:

Table 5. Results of ablation study.

https://doi.org/10.1371/journal.pone.0330781.t005

Baseline: The baseline model, serving as a reference, achieved an accuracy of 0.826 with a standard deviation of 0.045, a specificity of 0.923 with a standard deviation of 0.059, a sensitivity of 0.750 with a standard deviation of 0.102, a F1 score of 0.743 with a standard deviation of 0.141 and a macro-F1 score of 0.746 with a standard deviation of 0.118. These figures establish a benchmark for the evaluation of the subsequent modifications.

Independent contribution of dual-branch loss: The integration of the proposed dual-branch loss function (Loss) into the baseline model has led to a comprehensive enhancement in model performance, with accuracy increasing by 1.7% (from 0.826 to 0.840) and the standard deviation reduced by 13.3% (from 0.045 to 0.039). There is a slight improvement in specificity, reaching 0.926 with a standard deviation of 0.055. Sensitivity is enhanced by 7.7% (from 0.750 to 0.808) with a standard deviation reduced by 7.8% (from 0.102 to 0.094). The F1 score and macro-F1 score have also seen improvements of 5.1% (from 0.743 to 0.781) and 6.4% (from 0.746 to 0.794), respectively. This indicates that the dual-branch loss function contributes to the model’s predictive performance while maintaining its ability to correctly identify negative samples.

Independent contribution of MsHop: After introducing the multi-scale high-order pooling module (MsHop) into the baseline model, the five evaluation metrics have been improved to 0.862, 0.928, 0.885, 0.826, and 0.854 respectively, with lower standard deviations compared to the baseline model, showing a more pronounced improvement than the loss function. Specifically, accuracy increased by 4.4% (from 0.826 to 0.862), specificity by 0.5% (from 0.923 to 0.928), sensitivity by 18.0% (from 0.750 to 0.885), F1 score by 11.2% (from 0.743 to 0.826), and macro-F1 score by 14.5% (from 0.746 to 0.854). This suggests that the multi-scale high-order pooling module effectively captures features at different scales, which is important for classification tasks.

Combined effect: When both the dual-branch loss function and the multi-scale high-order pooling module are combined with the baseline model, the five evaluation metrics reach 0.884, 0.932, 0.912, 0.844, and 0.877, representing an increase of 7.0%, 1.0%, 21.6%, 13.6%, and 17.6% respectively compared to the baseline model. The standard deviations have decreased by 15.6%, 20.3%, 10.8%, 28.4%, and 18.6%. It can be further observed that the loss module contributes more to the reduction of variance, effectively enhancing the model’s stability, while the multi-scale high-order pooling module contributes more to performance improvement. The combination of these techniques demonstrates a synergistic effect, enhancing the overall classification performance of the GBC model using B-ultrasound images.

Effect of active learning

The experimental results presented in Fig 5 offer insights into the benefits of using a subset of data for training deep learning models. With only 35% of the total data, the time required for one training epoch is significantly reduced to 20 seconds, compared to 36 seconds when using the full dataset. This reduction in training time is crucial for rapid prototyping and iterative model refinement. Additionally, the storage requirements are substantially lower, with only 19.1 megabytes needed for 35% of the data, compared to 54.7 megabytes for the complete set. This decrease in storage demand reduces pressure on memory and storage, facilitates more efficient data management, and can lower costs associated with data storage and processing. These advantages underscore the potential for a more sustainable and cost-effective approach to deep learning model development, especially in scenarios where computational resources are limited or rapid model deployment is desired.

Download:

Fig 5. Effect of active learning.

https://doi.org/10.1371/journal.pone.0330781.g005

The experimental analysis shown in Fig 6 reveals a stark contrast between random data selection and AL strategies. Initially, at 15% data selection, algorithm with AL achieves a modest accuracy of 0.5479, comparable to random selection. However, as the percentage of data increases, AL demonstrates a superior ability to enhance model accuracy, reaching an impressive 0.8023 accuracy with just 25% of the data. This trend continues, with accuracy peaking at 0.8831 when 35% of the data is utilized, significantly outperforming random selection, which stabilizes at 0.8839 even with the full dataset. These results underscore the efficacy of AL in optimizing model performance with a fraction of the data, highlighting its potential for efficient and targeted data acquisition in machine learning processes.

Download:

Fig 6. Comparison between AL and randomly selected data.

https://doi.org/10.1371/journal.pone.0330781.g006

The dataset used in this study was collected from PGIMER, a tertiary care referral hospital located in Chandigarh, northern India. All ultrasound images were acquired by radiologists using the Logiq S8 system. While this dataset provides a valuable foundation for model development, we acknowledge that its single-center origin may limit the generalizability of the model. Variations in ultrasound equipment brands and models across different hospitals could potentially affect model performance when applied in new clinical settings. For instance, differences in image resolution, contrast, and noise levels among devices may lead to performance degradation in environments that differ from the training data.

In addition, patient demographics and disease distributions may vary across institutions, including factors such as age, gender composition, and prevalence of specific conditions. These discrepancies could further influence the model’s generalization ability.

To address these limitations, we have outlined a comprehensive plan for future external validation. With sufficient funding, we aim to collaborate with multiple medical centers to conduct multi-institutional validation studies. This will allow us to evaluate the model’s performance across a broader range of ultrasound devices, including those manufactured by Siemens, Philips, and other vendors. By testing the model in diverse clinical environments, we can better assess its stability and adaptability across different equipment and patient populations.

Furthermore, future work will focus on refining and optimizing the model to enhance its robustness and adaptability under varying conditions. We believe that through multi-center external validation, we can more thoroughly evaluate the clinical utility of the model and provide a more solid foundation for its real-world application.

Conclusions

This study introduces an innovative approach ASGBC to GBC classification by integrating AL with SSL. The proposed method effectively leverages the advantages of both strategies, reducing the need for annotation and training resources. It achieves accuracy comparable to using the full dataset with just 35% of the data, reducing training time by 44% and memory requirements by 65%. This is particularly valuable in scenarios with limited computational resources or the need for rapid model deployment. The feature extraction module, tailored for the characteristics of B-ultrasound imaging, integrates multi-scale and high-order information retrieval capabilities. The dual-branch loss function considers both data relevance and clustering features, enhancing accuracy and model stability. The classification accuracy, specificity, sensitivity, F1 score, and macro-F1 score achieved 0.884, 0.932, 0.912, 0.844, and 0.877, respectively. Compared to the baseline model, these metrics have seen improvements of 7.1%, 1.0%, 21.5%, 13.6%, and 17.4%, respectively. Additionally, the standard deviations have been reduced by 14.8%, 21.3%, 10.8%, 28.2%, and 19.1%, indicating greater model stability. These enhancements suggest that the proposed algorithm is suitable for clinical applications where precise and timely diagnosis is crucial. Overall, our research contributes to the advancement of medical imaging analysis by providing a practical solution for GBC detection. Future work will continue to refine the model, exploring additional enhancements and broader applications in medical diagnostics. We also plan to explore lightweight variants of the MsHop module, such as channel pruning, efficient convolution designs, or dynamic routing mechanisms, to reduce computational overhead while preserving performance. We will also consider optimizing the overall architecture to balance accuracy and efficiency more effectively.

Supporting information

S1 Appendix. Comprehensive derivation of confusion matrices and Kappa coefficients.

https://doi.org/10.1371/journal.pone.0330781.s001

(PDF)

References

1. Roa JC, García P, Kapoor VK, Maithel SK, Javle M, Koshiol J. Gallbladder cancer. Nat Rev Dis Primers. 2022;8(1):69. pmid:36302789
- View Article
- PubMed/NCBI
- Google Scholar
2. International Agency for Research on Cancer. Gallbladder Fact Sheet. 2022. https://gco.iarc.who.int/media/globocan/factsheets/cancers/12-gallbladder-fact-sheet.pdf
3. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020 : GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71(3):209–49. pmid:33538338
- View Article
- PubMed/NCBI
- Google Scholar
4. Abou-Alfa GK, Jarnagin W, El Dika I, D’Angelica M, Lowery M, Brown K. Liver and bile duct cancer. Abeloff’s clinical oncology. Elsevier; 2020. p. 1314–41.
5. Ellington TD, Momin B, Wilson RJ, Henley SJ, Wu M, Ryerson AB. Incidence, mortality of cancers of the biliary tract, gallbladder,, liver by sex, age, race/ethnicity and stage at diagnosis: United States 2013 to 2017. Cancer Epidemiol Biomarkers Prev. 2021;30(9):1607–14. pmid:34244156
- View Article
- PubMed/NCBI
- Google Scholar
6. Yu MH, Kim YJ, Park HS, Jung SI. Benign gallbladder diseases: Imaging techniques and tips for differentiating with malignant gallbladder diseases. World J Gastroenterol. 2020;26(22):2967–86. pmid:32587442
- View Article
- PubMed/NCBI
- Google Scholar
7. Yuan H-X, Cao J-Y, Kong W-T, Xia H-S, Wang X, Wang W-P. Contrast-enhanced ultrasound in diagnosis of gallbladder adenoma. Hepatobiliary Pancreat Dis Int. 2015;14(2):201–7. pmid:25865694
- View Article
- PubMed/NCBI
- Google Scholar
8. Feng H, Yang B, Wang J, Liu M, Yin L, Zheng W. Identifying malignant breast ultrasound images using ViT-patch. Applied Sciences. 2023;13(6):3489.
- View Article
- Google Scholar
9. He Q, Yang Q, Xie M. HCTNet: a hybrid CNN-transformer network for breast ultrasound image segmentation. Comput Biol Med. 2023;155:106629. pmid:36787669
- View Article
- PubMed/NCBI
- Google Scholar
10. Chen X, Liu X, Wu Y, Wang Z, Wang SH. Research related to the diagnosis of prostate cancer based on machine learning medical images: a review. Int J Med Inform. 2024;181:105279. pmid:37977054
- View Article
- PubMed/NCBI
- Google Scholar
11. Jiang H, Imran M, Muralidharan P, Patel A, Pensa J, Liang M, et al. MicroSegNet: a deep learning approach for prostate segmentation on micro-ultrasound images. Comput Med Imaging Graph. 2024;112:102326. pmid:38211358
- View Article
- PubMed/NCBI
- Google Scholar
12. Gong H, Chen J, Chen G, Li H, Li G, Chen F. Thyroid region prior guided attention for ultrasound segmentation of thyroid nodules. Comput Biol Med. 2023;155:106389. pmid:36812810
- View Article
- PubMed/NCBI
- Google Scholar
13. Zhou J, Tian H, Wang W. Fully automated thyroid ultrasound screening utilizing multi-modality image and anatomical prior. Biomedical Signal Processing and Control. 2024;87:105430.
- View Article
- Google Scholar
14. Li Z, Yang J, Wang X, Zhou S. Establishment and evaluation of intelligent diagnostic model for ophthalmic ultrasound images based on deep learning. Ultrasound Med Biol. 2023;49(8):1760–7. pmid:37137742
- View Article
- PubMed/NCBI
- Google Scholar
15. Feng L, Zhang Y, Wei W, Qiu H, Shi M. Applying deep learning to recognize the properties of vitreous opacity in ophthalmic ultrasound images. Eye (Lond). 2024;38(2):380–5. pmid:37596401
- View Article
- PubMed/NCBI
- Google Scholar
16. Lucassen RT, Jafari MH, Duggan NM, Jowkar N, Mehrtash A, Fischetti C, et al. Deep learning for detection and localization of B-lines in lung ultrasound. IEEE J Biomed Health Inform. 2023;27(9):4352–61. pmid:37276107
- View Article
- PubMed/NCBI
- Google Scholar
17. Custode LL, Mento F, Tursi F, Smargiassi A, Inchingolo R, Perrone T, et al. Multi-objective automatic analysis of lung ultrasound data from COVID-19 patients by means of deep learning and decision trees. Appl Soft Comput. 2023;133:109926. pmid:36532127
- View Article
- PubMed/NCBI
- Google Scholar
18. Fiorentino MC, Villani FP, Di Cosmo M, Frontoni E, Moccia S. A review on deep-learning algorithms for fetal ultrasound-image analysis. Med Image Anal. 2023;83:102629. pmid:36308861
- View Article
- PubMed/NCBI
- Google Scholar
19. Ramirez Zegarra R, Ghi T. Use of artificial intelligence and deep learning in fetal ultrasound imaging. Ultrasound Obstet Gynecol. 2023;62(2):185–94. pmid:36436205
- View Article
- PubMed/NCBI
- Google Scholar
20. Ren P, Xiao Y, Chang X, Huang PY, Li Z, Gupta BB. A survey of deep active learning. ACM Computing Surveys. 2021;54(9):1–40.
- View Article
- Google Scholar
21. Dong Z, Huang X, Yuan G, Zhu H, Xiong H. Butterfly-core community search over labeled graphs. Proceedings of the VLDB Endowment. 2021;14(11):2006–18.
22. Shurrab S, Duwairi R. Self-supervised learning methods and applications in medical imaging analysis: a survey. PeerJ Comput Sci. 2022;8:e1045. pmid:36091989
- View Article
- PubMed/NCBI
- Google Scholar
23. Hang J, Dong Z, Zhao H, Song X, Wang P, Zhu H. Outside in: Market-aware heterogeneous graph neural network for employee turnover prediction. In: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining. 2022. p. 353–62.
24. Ye Y, Dong Z, Zhu H, Xu T, Song X, Yu R, et al. MANE: organizational network embedding with multiplex attentive neural networks. IEEE Transactions on Knowledge and Data Engineering. 2022;35(4):4047–61.
- View Article
- Google Scholar
25. Budd S, Robinson EC, Kainz B. A survey on active learning and human-in-the-loop deep learning for medical image analysis. Med Image Anal. 2021;71:102062. pmid:33901992
- View Article
- PubMed/NCBI
- Google Scholar
26. Shen D, Qin C, Wang C, Dong Z, Zhu H, Xiong H. Topic modeling revisited: a document graph-based neural network perspective. Advances in Neural Information Processing Systems. 2021;34:14681–93.
- View Article
- Google Scholar
27. Nguyen NQ, Le TS. A semi-supervised learning method to remedy the lack of labeled data. In: 2021 15th International Conference on Advanced Computing and Applications (ACOMP). 2021. p. 78–84.
28. Grill JB, Strub F, AltchÂ´e F, Tallec C, Richemond P, Buchatskaya E. Bootstrap your own latent-a new approach to self-supervised learning. Advances in Neural Information Processing Systems. 2020;33:21271–84.
- View Article
- Google Scholar
29. Mishra AK, Roy P, Bandyopadhyay S, Das SK. CR-SSL: a closely related self-supervised learning based approach for improving breast ultrasound tumor segmentation. International Journal of Imaging Systems and Technology. 2022;32(4):1209–20.
- View Article
- Google Scholar
30. Zhao Z, Yang G. Unsupervised contrastive learning of radiomics, deep features for label-efficient tumor classification. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021 : 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part II, 2021. p. 252–61.
31. Jiao J, Droste R, Drukker L, Papageorghiou AT, Noble JA. Self-supervised representation learning for ultrasound video. Proc IEEE Int Symp Biomed Imaging. 2020;2020:1847–50. pmid:32489519
- View Article
- PubMed/NCBI
- Google Scholar
32. Qi H, Collins S, Noble JA. Knowledge-guided pretext learning for utero-placental interface detection. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2020 : 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part I. 2020. p. 582–93.
33. Liu C, Qiao M, Jiang F, Guo Y, Jin Z, Wang Y. TN-USMA Net: Triple normalization-based gastrointestinal stromal tumors classification on multicenter EUS images with ultrasound-specific pretraining and meta attention. Med Phys. 2021;48(11):7199–214. pmid:34412155
- View Article
- PubMed/NCBI
- Google Scholar
34. Basu S, Singla S, Gupta M, Rana P, Gupta P, Arora C. Unsupervised contrastive learning of image representations from ultrasound videos with hard negative mining. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. 2022. p. 423–33.
35. Basu S, Gupta M, Madan C, Gupta P, Arora C. FocusMAE: gallbladder cancer detection from ultrasound videos with focused masked autoencoders. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024. p. 11715–25.
36. Lian J, Ma Y, Ma Y, Shi B, Liu J, Yang Z, et al. Automatic gallbladder and gallstone regions segmentation in ultrasound image. Int J Comput Assist Radiol Surg. 2017;12(4):553–68. pmid:28063077
- View Article
- PubMed/NCBI
- Google Scholar
37. Jeong Y, Kim JH, Chae H-D, Park S-J, Bae JS, Joo I, et al. Deep learning-based decision support system for the diagnosis of neoplastic gallbladder polyps on ultrasonography: Preliminary results. Sci Rep. 2020;10(1):7700. pmid:32382062
- View Article
- PubMed/NCBI
- Google Scholar
38. Kim T, Choi YH, Choi JH, Lee SH, Lee S, Lee IS. Gallbladder polyp classification in ultrasound images using an ensemble convolutional neural network model. J Clin Med. 2021;10(16):3585. pmid:34441881
- View Article
- PubMed/NCBI
- Google Scholar
39. Basu S, Gupta M, Rana P, Gupta P, Arora C. Surpassing the human accuracy: detecting gallbladder cancer from USG images with curriculum learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. p. 20886–96.
40. Shuvo SB, Chowdhury MZ. Classification of gallbladder cancer using average ensemble learning. In: 2024 6th International Conference on Electrical Engineering and Information & Communication Technology (ICEEICT). 2024. p. 1450–5.
41. Sener O, Savarese S. Active learning for convolutional neural networks: a core-set approach. arXiv preprint 2017. https://arxiv.org/abs/1708.00489
- View Article
- Google Scholar
42. Pinsler R, Gordon J, Nalisnick E, Hernández-Lobato JM. Bayesian batch active learning as sparse subset approximation. Advances in Neural Information Processing Systems. 2019;32.
- View Article
- Google Scholar
43. Sinha S, Ebrahimi S, Darrell T. Variational adversarial active learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019. p. 5972–81.
44. Xie B, Yuan L, Li S, Liu CH, Cheng X, Wang G. Active learning for domain adaptation: an energy-based approach. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2022. p. 8708–16.
45. Cabannes V, Bottou L, Lecun Y, Balestriero R. Active self-supervised learning: a few low-cost relationships are all you need. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023. p. 16274–83.
46. Chen T, Kornblith S, Norouzi M, Hinton G. A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning. 2020. p. 1597–607.
47. He K, Fan H, Wu Y, Xie S, Girshick R. Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. p. 9729–38.
48. Caron M, Misra I, Mairal J, Goyal P, Bojanowski P, Joulin A. Unsupervised learning of visual features by contrasting cluster assignments. Advances in Neural Information Processing Systems. 2020;33:9912–24.
- View Article
- Google Scholar
49. Zbontar J, Jing L, Misra I, LeCun Y, Deny S. Barlow twins: Self-supervised learning via redundancy reduction. In: International Conference On Machine Learning, 2021. p. 12310–20.
50. He K, Chen X, Xie S, Li Y, Dollar P, Girshick R. Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2022. p. 16000–9.
51. Chen K, Liu Z, Hong L, Xu H, Li Z, Yeung DY. Mixed autoencoder for self-supervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023. p. 22742–51.
52. Bao H, Dong L, Piao S, Wei F. Beit: bert pre-training of image transformers. arXiv preprint 2021. https://arxiv.org/abs/2106.08254
- View Article
- Google Scholar
53. Huang Z, Jin X, Lu C, Hou Q, Cheng M-M, Fu D, et al. Contrastive masked autoencoders are stronger vision learners. IEEE Trans Pattern Anal Mach Intell. 2024;46(4):2506–17. pmid:38015699
- View Article
- PubMed/NCBI
- Google Scholar
54. Mishra S, Robinson J, Chang H, Jacobs D, Sarna A, Maschinot A, et al. A simple, efficient and scalable contrastive masked autoencoder for learning visual representations. arXiv preprint 2022. https://arxiv.org/abs/2210.16870
- View Article
- Google Scholar
55. Oquab M, Darcet T, Moutakanni T, Vo H, Szafraniec M, Khalidov V. Dinov2: learning robust visual features without supervision. arXiv preprint 2023. https://arxiv.org/abs/2304.07193
- View Article
- Google Scholar
56. Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M. beta-VAE: learning basic visual concepts with a constrained variational framework. 2017. https://openreview.net/forum?id=Sy2fzU9gl
57. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S. Generative adversarial networks. Communications of the ACM. 2020;63(11):139–44.
- View Article
- Google Scholar
58. Bardes A, Ponce J, LeCun Y. Vicreg: variance-invariance-covariance regularization for self-supervised learning. arXiv preprint. 2021. https://arxiv.org/abs/2105.04906
- View Article
- Google Scholar

[ref1] 1. Roa JC, García P, Kapoor VK, Maithel SK, Javle M, Koshiol J. Gallbladder cancer. Nat Rev Dis Primers. 2022;8(1):69. pmid:36302789
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. International Agency for Research on Cancer. Gallbladder Fact Sheet. 2022. https://gco.iarc.who.int/media/globocan/factsheets/cancers/12-gallbladder-fact-sheet.pdf

[ref3] 3. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020 : GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71(3):209–49. pmid:33538338
View Article
PubMed/NCBI
Google Scholar

[7] View Article

[8] PubMed/NCBI

[9] Google Scholar

[ref4] 4. Abou-Alfa GK, Jarnagin W, El Dika I, D’Angelica M, Lowery M, Brown K. Liver and bile duct cancer. Abeloff’s clinical oncology. Elsevier; 2020. p. 1314–41.

[ref5] 5. Ellington TD, Momin B, Wilson RJ, Henley SJ, Wu M, Ryerson AB. Incidence, mortality of cancers of the biliary tract, gallbladder,, liver by sex, age, race/ethnicity and stage at diagnosis: United States 2013 to 2017. Cancer Epidemiol Biomarkers Prev. 2021;30(9):1607–14. pmid:34244156
View Article
PubMed/NCBI
Google Scholar

[12] View Article

[13] PubMed/NCBI

[14] Google Scholar

[ref6] 6. Yu MH, Kim YJ, Park HS, Jung SI. Benign gallbladder diseases: Imaging techniques and tips for differentiating with malignant gallbladder diseases. World J Gastroenterol. 2020;26(22):2967–86. pmid:32587442
View Article
PubMed/NCBI
Google Scholar

[16] View Article

[17] PubMed/NCBI

[18] Google Scholar

[ref7] 7. Yuan H-X, Cao J-Y, Kong W-T, Xia H-S, Wang X, Wang W-P. Contrast-enhanced ultrasound in diagnosis of gallbladder adenoma. Hepatobiliary Pancreat Dis Int. 2015;14(2):201–7. pmid:25865694
View Article
PubMed/NCBI
Google Scholar

[20] View Article

[21] PubMed/NCBI

[22] Google Scholar

[ref8] 8. Feng H, Yang B, Wang J, Liu M, Yin L, Zheng W. Identifying malignant breast ultrasound images using ViT-patch. Applied Sciences. 2023;13(6):3489.
View Article
Google Scholar

[24] View Article

[25] Google Scholar

[ref9] 9. He Q, Yang Q, Xie M. HCTNet: a hybrid CNN-transformer network for breast ultrasound image segmentation. Comput Biol Med. 2023;155:106629. pmid:36787669
View Article
PubMed/NCBI
Google Scholar

[27] View Article

[28] PubMed/NCBI

[29] Google Scholar

[ref10] 10. Chen X, Liu X, Wu Y, Wang Z, Wang SH. Research related to the diagnosis of prostate cancer based on machine learning medical images: a review. Int J Med Inform. 2024;181:105279. pmid:37977054
View Article
PubMed/NCBI
Google Scholar

[31] View Article

[32] PubMed/NCBI

[33] Google Scholar

[ref11] 11. Jiang H, Imran M, Muralidharan P, Patel A, Pensa J, Liang M, et al. MicroSegNet: a deep learning approach for prostate segmentation on micro-ultrasound images. Comput Med Imaging Graph. 2024;112:102326. pmid:38211358
View Article
PubMed/NCBI
Google Scholar

[35] View Article

[36] PubMed/NCBI

[37] Google Scholar

[ref12] 12. Gong H, Chen J, Chen G, Li H, Li G, Chen F. Thyroid region prior guided attention for ultrasound segmentation of thyroid nodules. Comput Biol Med. 2023;155:106389. pmid:36812810
View Article
PubMed/NCBI
Google Scholar

[39] View Article

[40] PubMed/NCBI

[41] Google Scholar

[ref13] 13. Zhou J, Tian H, Wang W. Fully automated thyroid ultrasound screening utilizing multi-modality image and anatomical prior. Biomedical Signal Processing and Control. 2024;87:105430.
View Article
Google Scholar

[43] View Article

[44] Google Scholar

[ref14] 14. Li Z, Yang J, Wang X, Zhou S. Establishment and evaluation of intelligent diagnostic model for ophthalmic ultrasound images based on deep learning. Ultrasound Med Biol. 2023;49(8):1760–7. pmid:37137742
View Article
PubMed/NCBI
Google Scholar

[46] View Article

[47] PubMed/NCBI

[48] Google Scholar

[ref15] 15. Feng L, Zhang Y, Wei W, Qiu H, Shi M. Applying deep learning to recognize the properties of vitreous opacity in ophthalmic ultrasound images. Eye (Lond). 2024;38(2):380–5. pmid:37596401
View Article
PubMed/NCBI
Google Scholar

[50] View Article

[51] PubMed/NCBI

[52] Google Scholar

[ref16] 16. Lucassen RT, Jafari MH, Duggan NM, Jowkar N, Mehrtash A, Fischetti C, et al. Deep learning for detection and localization of B-lines in lung ultrasound. IEEE J Biomed Health Inform. 2023;27(9):4352–61. pmid:37276107
View Article
PubMed/NCBI
Google Scholar

[54] View Article

[55] PubMed/NCBI

[56] Google Scholar

[ref17] 17. Custode LL, Mento F, Tursi F, Smargiassi A, Inchingolo R, Perrone T, et al. Multi-objective automatic analysis of lung ultrasound data from COVID-19 patients by means of deep learning and decision trees. Appl Soft Comput. 2023;133:109926. pmid:36532127
View Article
PubMed/NCBI
Google Scholar

[58] View Article

[59] PubMed/NCBI

[60] Google Scholar

[ref18] 18. Fiorentino MC, Villani FP, Di Cosmo M, Frontoni E, Moccia S. A review on deep-learning algorithms for fetal ultrasound-image analysis. Med Image Anal. 2023;83:102629. pmid:36308861
View Article
PubMed/NCBI
Google Scholar

[62] View Article

[63] PubMed/NCBI

[64] Google Scholar

[ref19] 19. Ramirez Zegarra R, Ghi T. Use of artificial intelligence and deep learning in fetal ultrasound imaging. Ultrasound Obstet Gynecol. 2023;62(2):185–94. pmid:36436205
View Article
PubMed/NCBI
Google Scholar

[66] View Article

[67] PubMed/NCBI

[68] Google Scholar

[ref20] 20. Ren P, Xiao Y, Chang X, Huang PY, Li Z, Gupta BB. A survey of deep active learning. ACM Computing Surveys. 2021;54(9):1–40.
View Article
Google Scholar

[70] View Article

[71] Google Scholar

[ref21] 21. Dong Z, Huang X, Yuan G, Zhu H, Xiong H. Butterfly-core community search over labeled graphs. Proceedings of the VLDB Endowment. 2021;14(11):2006–18.

[ref22] 22. Shurrab S, Duwairi R. Self-supervised learning methods and applications in medical imaging analysis: a survey. PeerJ Comput Sci. 2022;8:e1045. pmid:36091989
View Article
PubMed/NCBI
Google Scholar

[74] View Article

[75] PubMed/NCBI

[76] Google Scholar

[ref23] 23. Hang J, Dong Z, Zhao H, Song X, Wang P, Zhu H. Outside in: Market-aware heterogeneous graph neural network for employee turnover prediction. In: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining. 2022. p. 353–62.

[ref24] 24. Ye Y, Dong Z, Zhu H, Xu T, Song X, Yu R, et al. MANE: organizational network embedding with multiplex attentive neural networks. IEEE Transactions on Knowledge and Data Engineering. 2022;35(4):4047–61.
View Article
Google Scholar

[79] View Article

[80] Google Scholar

[ref25] 25. Budd S, Robinson EC, Kainz B. A survey on active learning and human-in-the-loop deep learning for medical image analysis. Med Image Anal. 2021;71:102062. pmid:33901992
View Article
PubMed/NCBI
Google Scholar

[82] View Article

[83] PubMed/NCBI

[84] Google Scholar

[ref26] 26. Shen D, Qin C, Wang C, Dong Z, Zhu H, Xiong H. Topic modeling revisited: a document graph-based neural network perspective. Advances in Neural Information Processing Systems. 2021;34:14681–93.
View Article
Google Scholar

[86] View Article

[87] Google Scholar

[ref27] 27. Nguyen NQ, Le TS. A semi-supervised learning method to remedy the lack of labeled data. In: 2021 15th International Conference on Advanced Computing and Applications (ACOMP). 2021. p. 78–84.

[ref28] 28. Grill JB, Strub F, AltchÂ´e F, Tallec C, Richemond P, Buchatskaya E. Bootstrap your own latent-a new approach to self-supervised learning. Advances in Neural Information Processing Systems. 2020;33:21271–84.
View Article
Google Scholar

[90] View Article

[91] Google Scholar

[ref29] 29. Mishra AK, Roy P, Bandyopadhyay S, Das SK. CR-SSL: a closely related self-supervised learning based approach for improving breast ultrasound tumor segmentation. International Journal of Imaging Systems and Technology. 2022;32(4):1209–20.
View Article
Google Scholar

[93] View Article

[94] Google Scholar

[ref30] 30. Zhao Z, Yang G. Unsupervised contrastive learning of radiomics, deep features for label-efficient tumor classification. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021 : 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part II, 2021. p. 252–61.

[ref31] 31. Jiao J, Droste R, Drukker L, Papageorghiou AT, Noble JA. Self-supervised representation learning for ultrasound video. Proc IEEE Int Symp Biomed Imaging. 2020;2020:1847–50. pmid:32489519
View Article
PubMed/NCBI
Google Scholar

[97] View Article

[98] PubMed/NCBI

[99] Google Scholar

[ref32] 32. Qi H, Collins S, Noble JA. Knowledge-guided pretext learning for utero-placental interface detection. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2020 : 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part I. 2020. p. 582–93.

[ref33] 33. Liu C, Qiao M, Jiang F, Guo Y, Jin Z, Wang Y. TN-USMA Net: Triple normalization-based gastrointestinal stromal tumors classification on multicenter EUS images with ultrasound-specific pretraining and meta attention. Med Phys. 2021;48(11):7199–214. pmid:34412155
View Article
PubMed/NCBI
Google Scholar

[102] View Article

[103] PubMed/NCBI

[104] Google Scholar

[ref34] 34. Basu S, Singla S, Gupta M, Rana P, Gupta P, Arora C. Unsupervised contrastive learning of image representations from ultrasound videos with hard negative mining. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. 2022. p. 423–33.

[ref35] 35. Basu S, Gupta M, Madan C, Gupta P, Arora C. FocusMAE: gallbladder cancer detection from ultrasound videos with focused masked autoencoders. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024. p. 11715–25.

[ref36] 36. Lian J, Ma Y, Ma Y, Shi B, Liu J, Yang Z, et al. Automatic gallbladder and gallstone regions segmentation in ultrasound image. Int J Comput Assist Radiol Surg. 2017;12(4):553–68. pmid:28063077
View Article
PubMed/NCBI
Google Scholar

[108] View Article

[109] PubMed/NCBI

[110] Google Scholar

[ref37] 37. Jeong Y, Kim JH, Chae H-D, Park S-J, Bae JS, Joo I, et al. Deep learning-based decision support system for the diagnosis of neoplastic gallbladder polyps on ultrasonography: Preliminary results. Sci Rep. 2020;10(1):7700. pmid:32382062
View Article
PubMed/NCBI
Google Scholar

[112] View Article

[113] PubMed/NCBI

[114] Google Scholar

[ref38] 38. Kim T, Choi YH, Choi JH, Lee SH, Lee S, Lee IS. Gallbladder polyp classification in ultrasound images using an ensemble convolutional neural network model. J Clin Med. 2021;10(16):3585. pmid:34441881
View Article
PubMed/NCBI
Google Scholar

[116] View Article

[117] PubMed/NCBI

[118] Google Scholar

[ref39] 39. Basu S, Gupta M, Rana P, Gupta P, Arora C. Surpassing the human accuracy: detecting gallbladder cancer from USG images with curriculum learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. p. 20886–96.

[ref40] 40. Shuvo SB, Chowdhury MZ. Classification of gallbladder cancer using average ensemble learning. In: 2024 6th International Conference on Electrical Engineering and Information & Communication Technology (ICEEICT). 2024. p. 1450–5.

[ref41] 41. Sener O, Savarese S. Active learning for convolutional neural networks: a core-set approach. arXiv preprint 2017. https://arxiv.org/abs/1708.00489
View Article
Google Scholar

[122] View Article

[123] Google Scholar

[ref42] 42. Pinsler R, Gordon J, Nalisnick E, Hernández-Lobato JM. Bayesian batch active learning as sparse subset approximation. Advances in Neural Information Processing Systems. 2019;32.
View Article
Google Scholar

[125] View Article

[126] Google Scholar

[ref43] 43. Sinha S, Ebrahimi S, Darrell T. Variational adversarial active learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019. p. 5972–81.

[ref44] 44. Xie B, Yuan L, Li S, Liu CH, Cheng X, Wang G. Active learning for domain adaptation: an energy-based approach. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2022. p. 8708–16.

[ref45] 45. Cabannes V, Bottou L, Lecun Y, Balestriero R. Active self-supervised learning: a few low-cost relationships are all you need. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023. p. 16274–83.

[ref46] 46. Chen T, Kornblith S, Norouzi M, Hinton G. A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning. 2020. p. 1597–607.

[ref47] 47. He K, Fan H, Wu Y, Xie S, Girshick R. Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. p. 9729–38.

[ref48] 48. Caron M, Misra I, Mairal J, Goyal P, Bojanowski P, Joulin A. Unsupervised learning of visual features by contrasting cluster assignments. Advances in Neural Information Processing Systems. 2020;33:9912–24.
View Article
Google Scholar

[133] View Article

[134] Google Scholar

[ref49] 49. Zbontar J, Jing L, Misra I, LeCun Y, Deny S. Barlow twins: Self-supervised learning via redundancy reduction. In: International Conference On Machine Learning, 2021. p. 12310–20.

[ref50] 50. He K, Chen X, Xie S, Li Y, Dollar P, Girshick R. Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2022. p. 16000–9.

[ref51] 51. Chen K, Liu Z, Hong L, Xu H, Li Z, Yeung DY. Mixed autoencoder for self-supervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023. p. 22742–51.

[ref52] 52. Bao H, Dong L, Piao S, Wei F. Beit: bert pre-training of image transformers. arXiv preprint 2021. https://arxiv.org/abs/2106.08254
View Article
Google Scholar

[139] View Article

[140] Google Scholar

[ref53] 53. Huang Z, Jin X, Lu C, Hou Q, Cheng M-M, Fu D, et al. Contrastive masked autoencoders are stronger vision learners. IEEE Trans Pattern Anal Mach Intell. 2024;46(4):2506–17. pmid:38015699
View Article
PubMed/NCBI
Google Scholar

[142] View Article

[143] PubMed/NCBI

[144] Google Scholar

[ref54] 54. Mishra S, Robinson J, Chang H, Jacobs D, Sarna A, Maschinot A, et al. A simple, efficient and scalable contrastive masked autoencoder for learning visual representations. arXiv preprint 2022. https://arxiv.org/abs/2210.16870
View Article
Google Scholar

[146] View Article

[147] Google Scholar

[ref55] 55. Oquab M, Darcet T, Moutakanni T, Vo H, Szafraniec M, Khalidov V. Dinov2: learning robust visual features without supervision. arXiv preprint 2023. https://arxiv.org/abs/2304.07193
View Article
Google Scholar

[149] View Article

[150] Google Scholar

[ref56] 56. Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M. beta-VAE: learning basic visual concepts with a constrained variational framework. 2017. https://openreview.net/forum?id=Sy2fzU9gl

[ref57] 57. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S. Generative adversarial networks. Communications of the ACM. 2020;63(11):139–44.
View Article
Google Scholar

[153] View Article

[154] Google Scholar

[ref58] 58. Bardes A, Ponce J, LeCun Y. Vicreg: variance-invariance-covariance regularization for self-supervised learning. arXiv preprint. 2021. https://arxiv.org/abs/2105.04906
View Article
Google Scholar

[156] View Article

[157] Google Scholar

Figures

Abstract

Introduction

Related works

Method

Overview of the framework

Data selection

Get extractor

Experiments

Comparison experiment

Clinical relevance and human-AI collaboration

Noise robustness analysis

Ablation study

Effect of active learning

Conclusions

Supporting information

S1 Appendix. Comprehensive derivation of confusion matrices and Kappa coefficients.

References