BUSClean: Open-source software for breast ultrasound image pre-processing and knowledge extraction for medical AI

Arianna Bunnell; Kailee Hung; John A. Shepherd; Peter Sadowski

doi:10.1371/journal.pone.0315434

Abstract

Development of artificial intelligence (AI) for medical imaging demands curation and cleaning of large-scale clinical datasets comprising hundreds of thousands of images. Some modalities, such as mammography, contain highly standardized imaging. In contrast, breast ultrasound imaging (BUS) can contain many irregularities not indicated by scan metadata, such as enhanced scan modes, sonographer annotations, or additional views. We present an open-source software solution for automatically processing clinical BUS datasets. The algorithm performs BUS scan filtering (flagging of invalid and non-B-mode scans), cleaning (dual-view scan detection, scan area cropping, and caliper detection), and knowledge extraction (BI-RADS Labeling and Measurement fields) from sonographer annotations. Its modular design enables users to adapt it to new settings. Experiments on an internal testing dataset of 430 clinical BUS images achieve >95% sensitivity and >98% specificity in detecting every type of text annotation, >98% sensitivity and specificity in detecting scans with blood flow highlighting, alternative scan modes, or invalid scans. A case study on a completely external, public dataset of BUS scans found that BUSClean identified text annotations and scans with blood flow highlighting with 88.6% and 90.9% sensitivity and 98.3% and 99.9% specificity, respectively. Adaptation of the lesion caliper detection method to account for a type of caliper specific to the case study demonstrates the intended use of BUSClean in new data distributions and improved performance in lesion caliper detection from 43.3% and 93.3% out-of-the-box to 92.1% and 92.3% sensitivity and specificity, respectively. Source code, example notebooks, and sample data are available at https://github.com/hawaii-ai/bus-cleaning.

Citation: Bunnell A, Hung K, Shepherd JA, Sadowski P (2024) BUSClean: Open-source software for breast ultrasound image pre-processing and knowledge extraction for medical AI. PLoS ONE 19(12): e0315434. https://doi.org/10.1371/journal.pone.0315434

Editor: Arka Bhowmik, Memorial Sloan Kettering Cancer Center, UNITED STATES OF AMERICA

Received: August 21, 2024; Accepted: November 25, 2024; Published: December 11, 2024

Copyright: © 2024 Bunnell et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Public sharing of data from the HIPIMR is not explicitly allowed in the protocol approved by our ethics board. Release of data (clinical breast ultrasound images) risks compromising patient privacy. Data may be requested for research purposes at any time through an online research data use request (https://hipimr.shepherdresearchlab.org/data-use-request/).

Funding: K. H. was supported by the National Science Foundation Award No. 2149133 while completing this work. https://www.nsf.gov/ The funder did not play any role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. J.A.S. NCI Grant 5R01CA263491-02. National Cancer Institute. https://www.cancer.gov/ The funder did not play any role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

The development of artificial intelligence (AI) for diagnosis and treatment planning from medical imaging is an exploding area of research. Medical imaging AI being developed for breast ultrasound imaging (BUS) is no exception, with 103 papers using breast, ultrasound and AI in the title/abstract indexed by PubMed since 2023 alone. Data curation and cleanliness are closely linked to the quality and robustness of developed AI systems. Curation of clinical medical imaging data presents a significant challenge for researchers. In contrast to highly-standardized modalities such as screening mammography, wherein standardized views with little operator-dependent annotation are captured, clinical BUS data may contain a variety of burnt-in annotations, artifacts, and other scan abnormalities. For example, a single diagnostic BUS image may include all the following artifacts burnt-into the image: calipers for lesion measurement, free-text exam positioning information, free-text notes on patient symptoms, blood flow highlighting, and overlays describing software settings. If not removed, cleaned, or otherwise accounted for, these abnormalities introduce noise into the medical imaging AI development pipeline and may result in unexpected relationships between abnormalities being learned by the model or artificial inflation of model performance [1,2]. [3] presents an example of the positive effect of data cleaning when developing AI for BUS specifically. Deep learning model training and evaluation procedures rely on large quantities of relatively clean data to be available, necessitating a mostly-automatic curation and cleaning pipeline, otherwise the volume of data to be manually reviewed becomes unmanageable. For example, [4] develops a BUS AI system trained on five million clinical images which were automatically curated through an internal process described in [5].

In this work, we introduce BUSClean: an open-source software for curation of clinical datasets of BUS images for ingestion into AI development or evaluation pipelines. As part of BUSClean we propose methods for the identification of invalid and enhanced BUS scans, recognition, and classification of text annotations into meaningful categories, and automatic cropping to the scan area. The main contributions of this paper are summarized as follows: (1) release of the only open-source software solution for automatic BUS dataset curation; (2) the first application, to the authors’ knowledge, of OCR techniques for detection of scan position, anatomy, and procedure from sonographer annotations; (3) demonstration of BUSClean on unseen internal data showing retained high performance; and (4) presentation of a case study showing the intended use of BUSClean for application in specific clinical data distributions.

Materials and methods

BUSClean was developed to aid in the preparation of clinical BUS imaging datasets for use in deep learning training and evaluation pipelines using the OpenCV [6] and PIL [7] libraries in Python. Fig 1 gives an outline of the flow and functions of the software. Our algorithm consists of a sequence of preprocessing steps. The steps are modular and can be added or removed by the user. All functions of BUSClean can be categorized into either scan filtering, wherein we are flagging scans with certain artifacts, scan cleaning, wherein we are applying transformations to the scans themselves to increase uniformity, or knowledge extraction, wherein we are extracting metadata from the scan image.

Download:

Fig 1. BUSClean process diagram.

Complete process diagram depicting all functions of the BUSClean open-source software for the purpose of standardizing and extracting metadata from scans before AI model training or evaluation.

https://doi.org/10.1371/journal.pone.0315434.g001

Scan filtering is performed to remove scans with disruptive artifacts from development datasets for deep learning training and evaluation. Scan cleaning removes the artifact itself while retaining the image. Without extensive cleaning, artifacts may lead to artificially diminished or increased AI performance being reported when using opportunistic or clinical datasets. For example, if a sonographer places measurement calipers on a BUS scan to indicate a lesion and this scan is used to train a lesion detection model, the model may incorrectly learn to detect calipers rather than the actual lesions. An AI model trained on uncurated BUS data may also learn to “cheat” in predictions by using features in the image that are unrelated to the task but correlated with the target variable. This phenomenon is known as the “clever Hans effect” in the machine learning literature. [8–10] present examples of the “clever Hans effect” being discovered in medical AI. As an example, sonographers may be more likely to include text annotations onto a scan which has a higher likelihood of malignancy, due to an increase in notable features or patient symptoms. An AI model then trained on this data may learn to falsely associate the presence of text annotations with lesion malignancy, rather than identifying features of the breast tissue or lesion as being indicative of malignancy.

BUS scans are typically stored as DICOM (Digital Imaging and Communications in Medicine) images, containing embedded header fields which store metadata about the exam. We found that in our internal clinical dataset, the scan features and artifacts that BUSClean detects could not be identified by examination of the DICOM header alone.

Scan filtering

BUS scans can be captured in unenhanced brightness mode (B-mode), with blood flow highlighting (Color or Power Doppler highlighting), or with elastography (US modality which provides information about tissue stiffness). In medical AI training, it is desirable to define your population of images precisely (such as only shear-wave elastography BUS images) to reduce the amount of non-task-related variance present in your dataset which may add noise to your model learning. Typically, in deep learning model development for BUS, researchers implicitly define their population of images as B-mode images. [4,11–13] are some examples of models developed for B-mode imaging. There is also a body of work which develops AI for BUS elastography imaging [14–17]. In medical AI evaluation, presentation of images unfamiliar to a model, for example a model trained on Color Doppler images being presented with B-mode images, may make the model perform unexpectedly. To reduce the risk of images outside of the target population being included in AI model pipelines, we develop two scan filtering methods to enable the flagging and optional removal of elastography, blood flow highlighted, and invalid BUS images from clinical BUS datasets.

Scans are defined as invalid if more than 75% of the scan area is black (grayscale pixel value of less than five). This method identifies scans which contain very little tissue, scans which only contain burnt-in annotations (such as chaperone name), and scans where the BUS machine malfunctioned. This method may also identify and flag scans in which most of the scan area is a hypoechoic lesion or implant.

Elastography and scans with blood flow highlighting are considered broadly to be “non-B-mode” scans in BUSClean. These scans are identified through a four-step process. First, grayscale scans are removed from the process and labeled as B-mode. Second, HSV color masks for colors found in blood flow highlighting (orange, green, yellow, red, and blue depending on scan manufacturer and mode) and elastography/blood flow highlighting indicator boxes (green and white) are created and dilated. Third, complete and partial rectangles, as well as image-spanning lines are identified from the green/white threshold image and if present, the image is classified as non-B-mode. Lastly, remaining scans are labeled as non-B-mode if the blood flow highlighting mask comprises more than 0.5% of the scan area. The complete process is displayed for three example scans in Fig 2.

Download:

Fig 2. Detecting non-B-mode scans.

Diagram showing the step-by-step processing of detecting non-B-mode scans. The top row shows a BUS scan with blood flow highlighting being successfully detected with four contours detected, indicating a rectangle in the image which is characteristic of blood flow highlighting being applied. The bottom row shows an elastography scan being successfully detected via recognition of the extended vertical line in the elastography overlay.

https://doi.org/10.1371/journal.pone.0315434.g002

Knowledge extraction

Sonographers will often annotate BUS scan images with text annotations burnt-into the image file to provide additional context for image interpretation (i.e., indicating an area of pain or a palpable lump), anatomical information (i.e., labeling the nipple location or an area of scarring), and descriptive information about the scan collection process. The American College of Radiology’s (ACR) Breast Imaging Reporting & Data System (BI-RADS) system for US outlines guidelines in Section IC: Labeling and Measurement that all BUS exams should contain permanent identification containing the following: (1) facility name and location; (2) examination date; (3) patient’s first and last name; (4) patient identification number or date of birth; (5) designation of left or right breast; (6) Anatomic location using clock-face notation or a labeled diagram; (7) Transducer orientation; (8) distance from the nipple in centimeters; and (9) sonographer’s and/or physician’s identifying number/initials/symbol [18].

Items (1)–(4) and (9) from the ACR labelling guidelines for BUS constitute personally identifying information which is typically stored in the DICOM header metadata and removed prior to scans being exported from clinical centers for deep learning training. Items (5)–(8), however, contain non-identifiable information about the scan itself, are usually burnt-in to the image, and may contain information useful for AI training or evaluation. For example, extraction of laterality and distance from nipple or abnormality may allow for automatic filtering of scans which are unlikely to include a lesion prior to classification or segmentation, AI training, or consulting radiologist labeling.

BUSClean identifies items (5)–(8) from BUS scan images using a combination of Optical Character Recognition (OCR) through EasyOCR (Jaided AI; Bangkok, Thailand) and regular expression-based pattern matching. Notably, BUSClean does not recognize quadrant-style notation (LOQ for lower outer quadrant, RIQ for radial inner quadrant, etc.) for anatomic location, as this style is not explicitly recommended by the ACR in their guidelines [18]. BUSClean only automatically recognizes anatomical location in clock-face notation, which is the text method recommended by the ACR [18]. See Fig 3 for an illustration of BUSClean text field matching.

Download:

Fig 3. BUSClean text field matching.

Two example scans with text detected by EasyOCR [19] and the fields recognized after pattern matching by BUSClean.

https://doi.org/10.1371/journal.pone.0315434.g003

In addition to the ACR-defined items, BUSClean also identifies and flags scans with text indicating what we define as procedural imaging, lesion measurements, and imaging of the axillary region. Procedural imaging, according to our definition, is any imaging containing text which refers to a surgical procedure, such as US-guided biopsy. We also include imaging which notes inclusion of a clip, marker, or coil into this category. Due to the abnormality of these findings, they may be worth excluding in deep learning model development pipelines. AI models trained to recognize lesions as tissue abnormalities may erroneously flag clips, markers, or coils as lesions due to high degree of contrast with surrounding tissue. Lesion measurements may or may not coincide with measurement calipers (see Fig 4 for an example) and could be used for automatically excluding lesions smaller or larger than the target population, such as in [20,21]. Imaging of the axillary region is typically focused on the lymph nodes for detection of cancer metastasis, rather than imaging of a specific lesion or area of interest, making them outside the target image population for AI models designed for lesion detection and segmentation from BUS.

Download:

Fig 4. Lesion caliper detection process diagram.

Diagram showing the step-by-step process of detecting calipers in a BUS scan. The top and bottom rows show scans where all calipers are successfully detected and the image flagged for caliper presence. The middle row shows a scan where two of the four calipers are initially cropped out, but the scan is still identified as having calipers because two remain in the reduced scan area.

https://doi.org/10.1371/journal.pone.0315434.g004

Artifact detection

After capture of a scan frame showing a lesion or mass of interest, the examining sonographer may choose to annotate the frame with measurement calipers or capture another orthogonal view to be displayed alongside the initial frame in a single imaging record. The presence of lesion calipers in BUS scans used for AI model development may lead to hard-to-detect biases in model decision-making which are not frequently mentioned in BUS AI model development. Many papers which develop AI on the BUSI dataset [22] in particular do not mention manual or automatic filtering of lesion calipers prior to the training or evaluation of their methods; some examples are [23–26]. Failure to remove lesion calipers is problematic for methods performing lesion segmentation or detection, as the location of the lesion is explicitly indicated by calipers, leaving the AI to only interpolate lesion boundaries between sonographer-placed calipers, possibly completely failing to learn how to locate lesions without these markings. As it is implemented in BUSClean, lesion caliper identification may fall into either the scan filtering or cleaning subset of functions.

Dual-view scans (a single DICOM image which contains two images displayed side-by-side for purposes of comparison between two modalities or perspectives of a lesion; see an example in Fig 1) contain a harsh dividing line in the middle of the image which may confuse AI systems in evaluation and do not represent any properties of breast tissue. Leaving dual-view scans in evaluation datasets may lead to artificially low performance results being reported. Additionally, dual-view scans may contain views of two different lesions, one cancerous and one benign, confusing the histological labeling of BUS scans. Dual-view scan identification falls into the scan cleaning subset of BUSClean functions.

Dual-view scans are identified though the following process: (1) teal/green color masks are used to filter out elastography scans; (2) scan height/width ratio is used to filter out scans with width < 75% of their height; (3) application of a Canny edge detection filter; (4) split detection through calculation of number of edge-pixels on the midline is both > 100 and greater than 10 + number of white pixels at (midline– 10 pixels), and 10 + number of white pixels at (midline + 10 pixels).

Lesion calipers can be white/gray or colored and shaped as crosses, numbers, or “X”s and may or may not be connected with dotted lines spanning the lesion. In BUSClean, scans with lesion calipers are flagged and the coordinates of the detected calipers returned for optional user-side cropping. Lesion calipers are detected via a two-step process. First, the scan itself is enhanced by black masking of the outside 15% along every dimension to reduce the likelihood of software overlays or text being mistakenly identified as calipers. Then, the FIND-EDGES and maximum filters from the PIL [7] library are applied to isolate the caliper shapes and denoise the edge image. Finally, the resulting image is dilated. For the second step, contours are detected in this enhanced image and if the contour’s bounding box is between 70 and 10 pixels in height and width, it is counted as a caliper and the bounding box coordinates returned. Fig 4 displays the two-step processing on three example BUS images.

Scan area cropping

Identification and cropping of the scan area prior to AI training removes irrelevant background pixels and software artifacts from the scan and increases the effectiveness of standard image data augmentation methods such as rotation by making the resulting augmented images more plausible. BUSClean’s cropping method is designed to preserve as much tissue in the scan area as possible, while minimizing the amount of background in the final cropped image. Our two-stage method for scan area identification and cropping is adapted from the shape-based cropping method presented in [5]. Scan area cropping falls into the scan cleaning subset of BUSClean functions.

Initially, background pixels are identified and cropped by using binary thresholding (pixel values > mode pixel value + 10), erosion and dilation, and isolation of the largest connected component in the resulting mask. The bounding box surrounding the largest connected component is the first set of cropping coordinates. For rectangular scans, the process ends here, as the scan area completely fills the bounding box.

In addition to rectangular scans, BUS scan areas may also be convex, trapezoidal, or irregular (due to excessive shadowing in the lower half of the scan). These scan types benefit from a more aggressive cropping strategy. We define a Euclidean coordinate system on our BUS scan image where in the topmost, leftmost pixel corresponds to (0,0) and the bounding box is defined by with width w and height h. The width of the scan is cropped to the median index of the leftmost (lowest) and rightmost (highest) non-mode-valued pixel in the following image slices: , and . The height of the scan is cropped according to a similar strategy to the median index of the topmost (lowest) and bottommost (highest) non-mode-valued pixel in the following image slices: , and . Fig 5 displays the described coordinate system over simulated binary images with convex, trapezoidal, and irregular scan area shapes.

Download:

Fig 5. Scan cropping coordinate system.

Diagram displaying the coordinate system overlaid on scans with convex (left), trapezoidal (middle), and irregular (right) scan areas.

https://doi.org/10.1371/journal.pone.0315434.g005

Development dataset

BUSClean was developed on an internal dataset of 2,000 BUS scans from the Hawaiʻi and Pacific Islands Mammography Registry (HIPIMR; Western-Copernicus Group IRB Study Number 1264170). Researchers did not have access to information that could identify individual participants during or after data collection. The requirement for informed consent was waived. HIPIMR data are available for research via an online data use request. All images were collected from 2009–2023 on the island of Oʻahu. The HIPIMR collects data from three clinical partners: Clinic 1 is a nonprofit healthcare network (comprising four medical centers); Clinic 2 is a private nonprofit tertiary hospital; and Clinic 3 is a diagnostic medical imaging center. BUS images were selected for inclusion via a simple random sample from all images from all women in the HIPIMR with negative, benign, or probably benign (BI-RADS 1, 2, or 3) BUS visit within one year of a negative screening mammography visit and no personal history of breast cancer. Images from 1,413 patients were included. In the development set, 81% of BUS images were captured on a Philips Medical Systems IU22 system; 11.85% on a Philips ATL HDI 5000; 6.2% on a Toshiba Aplio XG; 0.85% on a Siemens ACUSON S2000; and 0.1% on an Esaote Technos 8234. Data were accessed for research purposes on February 2, 2023. For illustrative purposes, Fig 6 provides examples of all scan and text types present in the development dataset which BUSClean was designed to automatically identify.

Download:

Fig 6. Development dataset example images.

Six example images from the development dataset of BUSClean showing all types of scans flagged through scan filtering (enhanced scan mode and invalid scans), scan cleaning (lesion calipers, dual-view scans, and rectangular/trapezoidal scans) and knowledge extraction methods (all text annotation types including procedural imaging and lesion measurements).

https://doi.org/10.1371/journal.pone.0315434.g006

Internal test dataset

We designated an internal dataset of BUS images to test the performance of BUSClean’s knowledge extraction and artifact detection on unseen data. The internal test dataset is composed of 430 unseen BUS images. BUS images were selected for inclusion in the internal testing dataset via a simple random sample from all images (excluding development set) from all women in the HIPIMR with negative, benign, or probably benign (BI-RADS 1, 2, or 3) BUS visit within one year of a negative screening mammography visit and no personal history of breast cancer. Images from 391 patients were selected for inclusion, where 174 patients have images included in both the development and internal testing sets.

Case study dataset

We also examine the performance of BUSClean on an external dataset to perform a case study of how the tool may be used by deep learning researchers in the field as well as further testing performance of BUSClean’s knowledge extraction and artifact detection on unseen data. The external test dataset for the case study is comprised of the 780-image BUSI dataset from [22]. This dataset of BUS images is available publicly and was collected in 2018 from Baheya Hospital for Early Detection and Treatment of Women’s Cancer, Cairo (Egypt) with LOGIQ E9 ultrasound system and LOGIQ E9 Agile ultrasound. In contrast to the internal test and development datasets, the external test dataset was collected experimentally, rather than opportunistically from clinical care. Table 1 displays population counts for all tested characteristics on the BUSI dataset.

Download:

Table 1. Sample characteristics for the internal test and case study (BUSI) [22] datasets.

https://doi.org/10.1371/journal.pone.0315434.t001

The BUSI dataset originates from outside the U.S. (Egypt), and therefore is not expected to follow the ACR BUS labelling guidelines. Thus, the case study was tested for text recognition only, not classification into the ACR BUS labelling characteristics. Additionally, the scan filtering and artifact detection (specifically calipers) were tested on the case study dataset. Due to the high level of curation of the BUSI dataset, it can be presumed there are no invalid scans and the invalid scan filtering method was not tested on it. BUSClean’s functions are modular and can be selectively applied by researchers according to the characteristics of their BUS data.

Statistical analysis

The ground truth for all tasks except scan area cropping was determined through human annotation. The development, case study, and internal testing datasets were labeled for scan abnormalities and modality by two authors (A.B. and K.H.) with disagreements in labeling resolved through adjudication. Performance was assessed via computation of sensitivity and specificity for each individual binary classification task (i.e., labeling a scan as invalid/not invalid).

Results

Internal test dataset

In the internal test set, 82.33% of BUS images were captured on a Philips Medical Systems IU22 system; 10.47% on a Philips ATL HDI 5000; 5.6% on a Toshiba Aplio XG; and 1.6% on a Siemens ACUSON S2000.

The scan filtering, knowledge extraction, and artifact detection pipelines were tested on the internal test dataset. Table 1 displays population counts for all tested characteristics on the internal dataset.

Performance

Confusion matrices and statistics of BUSClean performance on the internal test dataset are presented in Table 2. BUSClean identified the laterality, orientation, distance, position, axilla, lesion measurement, and procedural text properties with 99.2%, 97.6%, 96.7%, 100%, 95.8%, 97.5%, and 100% sensitivity and 98.2%, 100%, 98.7%, 100%, 100%, 98.3%, and 100% specificity, respectively (for each property) on the internal test dataset. Scans imaged with an enhanced scan mode (non-B-mode; elastography scans or scans with blood flow highlighting) were identified with 100% sensitivity and 99.5% specificity in the internal test dataset. Lesion calipers, dual-view scans, and invalid scans were detected with 96.7%, 100%, and 100% sensitivity and 93.3%, 98.6%, and 100% specificity in the internal dataset.

Download:

Table 2. Confusion matrix and performance statistics of BUSClean performance on the internal test dataset.

https://doi.org/10.1371/journal.pone.0315434.t002

Confusion matrices and statistics of BUSClean performance on the case study/BUSI dataset are presented in Table 3. BUSClean recognized the presence of text with 88.6% sensitivity and 98.3% specificity in the case study dataset. Scans imaged with an enhanced scan mode were identified with 90.9% sensitivity and 99.9% specificity in the case study dataset. Lesion calipers were detected with 43.3% sensitivity and 93.3% specificity in the case study dataset using out-of-the-box BUSClean.

Download:

Table 3. Confusion matrix and performance statistics of BUSClean performance on the case study dataset/BUSI dataset [22].

https://doi.org/10.1371/journal.pone.0315434.t003

Discussion

Case study

The performance of BUSClean on the internal test dataset of unseen images was high. However, there will always be new types of annotations to handle, so researchers should anticipate needing to customize the functions BUSClean provides to their needs. BUSClean’s modular design and open-source license facilitate this. We recommend researchers first apply BUSClean to a small test dataset to evaluate performance on data from any new source. Custom filtering, cleaning, or knowledge extraction methods can be easily added to the processing pipeline. Users are encouraged to share these methods with the community. In the BUSI dataset case study, sensitivity in lesion caliper detection using the out-of-the-box BUSClean method decreased to under 50% (43.3%). Visual inspection of false negative results revealed that all but two false negatives failed to be detected as lesion calipers due to the style of spanning dotted line used to connect markers in the external test dataset (the remaining two were due to tissue brightness masking the caliper). This caused the internal line and markers to be detected as one object after dilation and therefore be missed by BUSClean. See Fig 7 for example false negative scans. This style of annotation was not present in the development dataset and thus was not accounted for in the original BUSClean code.

Download:

Fig 7. False-negative caliper detections in case study.

Example false negative (FN) scans from the external test (BUSI) [22] on the lesion caliper detection task. The resulting binary shapes (left) are too large to be detected by BUSClean’s caliper detection method.

https://doi.org/10.1371/journal.pone.0315434.g007

We implemented a new lesion caliper detection method for the BUSI dataset which additionally performed line detection using the Hough Transform through OpenCV [6] and if at least two detected lines intersected (but were not parallel), a caliper was detected. This change improved the sensitivity to 92.1% (TP = 117 and FN = 10) while only slightly decreasing specificity to 92.3% (TN = 603 and FP = 50). This case study provides an illustration of how BUSClean should be implemented, tested, and customized for use on new research datasets.

Limitations

The main limitation of BUSClean is that there is no guarantee BUSClean’s methods will transfer to new clinical BUS data-generating distributions, but the open-source nature of the software enables allow developers to easily adapt these methods future datasets. Despite our dataset size and diversity, researchers may encounter new types of lesion caliper or scan abnormalities in their datasets and should monitor BUSClean’s classifications to ensure adequate performance.

A second limitation of the BUSClean software is the lack of a built-in cropping method for removing lesion calipers from scans. It is plausible that BUS scans may have lesion calipers on the edge of the scan area, leaving a significant amount of uninterrupted tissue which could be fed to BUS AI pipelines without transferring possibly harmful correlations to model learning. However, we found in our internal dataset that this was not a common occurrence; lesions are frequently the only structure of interest in the scan and likely to be centered by the sonographer for ease of interpretation, leaving little uncorrupted tissue to be cropped and used for model training.

The last limitation of BUSClean is the focus on ACR-defined BUS text classification fields and English text. This limitation can be addressed by individual developers as BUSClean can easily be extended to new annotations and languages by defining new regular expressions and using a different OCR model.

Conclusions

In this work we presented BUSClean, an open-source software solution for curation of clinical BUS datasets for AI model training and evaluation. We define a pre-processing algorithm consisting of scan filtering, scan cleaning, and knowledge extraction methods. The system was evaluated on two different held out datasets for performance in identifying text fields and BUS scan types. We also demonstrated a case study showing the intended use of BUSClean on new clinical BUS datasets. BUSClean will help researchers curate the image datasets necessary for developing robust AI systems.

Acknowledgments

The authors would like to acknowledge Dr. Thomas K. Wolfgruber for preparation of the HIPIMR BUS data and the DICOM image extraction process.

References

1. Guo M, Wang Y, Yang Q, Li R, Zhao Y, Li C, et al. Normal Workflow and Key Strategies for Data Cleaning Toward Real-World Data. Interactive Journal of Medical Research. 2023;12(1):e44310.
- View Article
- Google Scholar
2. Willemink MJ, Koszek WA, Hardell C, Wu J, Fleischmann D, Harvey H, et al. Preparing medical imaging data for machine learning. Radiology. 2020;295(1):4–15. pmid:32068507
- View Article
- PubMed/NCBI
- Google Scholar
3. Șerbănescu M-S, Rotaru-Zăvăleanu A-D, Istrate-Ofițeru A-M, Maria BE-I-A, Enache I-A, Nagy RD, et al., editors. Medical Image Data Cleansing for Machine Learning: A Must in the Evidence-Based Medicine? International Conference on Advancements of Medicine and Health Care through Technology; 2022: Springer.
4. Shen Y, Shamout FE, Oliver JR, Witowski J, Kannan K, Park J, et al. Artificial intelligence system reduces false-positive findings in the interpretation of breast ultrasound exams. Nature Communications. 2021;12(1). pmid:34561440
- View Article
- PubMed/NCBI
- Google Scholar
5. Shamout FE, Shen Y, Witowski JS, Oliver JR, Kannan K, Wu N, et al., editors. The NYU Breast Ultrasound Dataset v1.02021.
- View Article
- Google Scholar
6. Bradski G. The opencv library. Dr Dobb’s Journal: Software Tools for the Professional Programmer. 2000;25(11):120–3.
- View Article
- Google Scholar
7. Clark A. Pillow (pil fork) documentation. readthedocs. 2015.
- View Article
- Google Scholar
8. Wallis D, Buvat I. Clever Hans effect found in a widely used brain tumour MRI dataset. Medical image analysis. 2022;77:102368. pmid:35063892
- View Article
- PubMed/NCBI
- Google Scholar
9. Kovács DP, McCorkindale W, Lee AA. Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias. Nature communications. 2021;12(1):1695. pmid:33727552
- View Article
- PubMed/NCBI
- Google Scholar
10. Bottani S, Burgos N, Maire A, Saracino D, Ströer S, Dormont D, et al. Evaluation of MRI-based machine learning approaches for computer-aided diagnosis of dementia in a clinical data warehouse. Medical Image Analysis. 2023;89:102903. pmid:37523918
- View Article
- PubMed/NCBI
- Google Scholar
11. Baek J, O’Connell AM, Parker KJ. Improving breast cancer diagnosis by incorporating raw ultrasound parameters into machine learning. Machine Learning: Science and Technology. 2022;3(4):045013. pmid:36698865
- View Article
- PubMed/NCBI
- Google Scholar
12. Gu Y, Xu W, Lin B, An X, Tian J, Ran H, et al. Deep learning based on ultrasound images assists breast lesion diagnosis in China: a multicenter diagnostic study. Insights into Imaging. 2022;13(1). pmid:35900608
- View Article
- PubMed/NCBI
- Google Scholar
13. Kim J, Kim HJ, Kim C, Lee JH, Kim KW, Park YM, et al. Weakly-supervised deep learning for ultrasound diagnosis of breast cancer. Scientific Reports. 2021;11(1). pmid:34934144
- View Article
- PubMed/NCBI
- Google Scholar
14. Zhang Q, Xiao Y, Dai W, Suo J, Wang C, Shi J, et al. Deep learning based classification of breast tumors with shear-wave elastography. Ultrasonics. 2016;72:150–7. Epub 20160806. pmid:27529139.
- View Article
- PubMed/NCBI
- Google Scholar
15. Zhang Q, Song S, Xiao Y, Chen S, Shi J, Zheng H. Dual-mode artificially-intelligent diagnosis of breast tumours in shear-wave elastography and B-mode ultrasound using deep polynomial networks. Medical engineering & physics. 2019;64:1–6. pmid:30578163
- View Article
- PubMed/NCBI
- Google Scholar
16. Fujioka T, Katsuta L, Kubota K, Mori M, Kikuchi Y, Kato A, et al. Classification of Breast Masses on Ultrasound Shear Wave Elastography using Convolutional Neural Networks. Ultrasonic imaging. 2020;42(4–5):213–20. pmid:32501152
- View Article
- PubMed/NCBI
- Google Scholar
17. Zhang X, Liang M, Yang Z, Zheng C, Wu J, Ou B, et al. Deep Learning-Based Radiomics of B-Mode Ultrasonography and Shear-Wave Elastography: Improved Performance in Breast Mass Classification. Frontiers in oncology. 2020;10:1621–. pmid:32984032
- View Article
- PubMed/NCBI
- Google Scholar
18. D’Orsi C, Bassett L, Feig S. Breast imaging reporting and data system (BI-RADS). Breast imaging atlas, 4th edn American College of Radiology, Reston. 2018.
- View Article
- Google Scholar
19. EasyOCR: JaidedAI; 2023. Available from: https://github.com/JaidedAI/EasyOCR.
20. Lee SE, Lee E, Kim E-K, Yoon JH, Park VY, Youk JH, et al. Application of Artificial Intelligence Computer-Assisted Diagnosis Originally Developed for Thyroid Nodules to Breast Lesions on Ultrasound. Journal of Digital Imaging. 2022. pmid:35902445
- View Article
- PubMed/NCBI
- Google Scholar
21. Bong JH, Kim TH, Jeong S. Deep learning model for the diagnosis of breast cancers smaller than 1 cm with ultrasonography: integration of ultrasonography and clinical factors. Quantitative Imaging in Medicine and Surgery. 2023;13(4):2486–95. pmid:37064369
- View Article
- PubMed/NCBI
- Google Scholar
22. Al-Dhabyani W, Gomaa M, Khaled H, Fahmy A. Dataset of breast ultrasound images. Data in brief. 2020;28:104863. pmid:31867417
- View Article
- PubMed/NCBI
- Google Scholar
23. Chowdary J, Yogarajah P, Chaurasia P, Guruviah V. A Multi-Task Learning Framework for Automated Segmentation and Classification of Breast Tumors From Ultrasound Images. Ultrasonic Imaging. 2022;44(1):3–12. pmid:35128997
- View Article
- PubMed/NCBI
- Google Scholar
24. Chen G, Li L, Dai Y, Zhang J, Yap MH. AAU-net: An Adaptive Attention U-net for Breast Lesions Segmentation in Ultrasound Images. IEEE Transactions on Medical Imaging. 2022:1-. pmid:36455083
- View Article
- PubMed/NCBI
- Google Scholar
25. Du R, Chen Y, Li T, Shi L, Fei Z, Li Y. Discrimination of Breast Cancer Based on Ultrasound Images and Convolutional Neural Network. Journal of Oncology. 2022;2022:1–9. pmid:35345516
- View Article
- PubMed/NCBI
- Google Scholar
26. Cui W, Peng Y, Yuan G, Cao W, Cao Y, Lu Z, et al. FMRNet: A fused network of multiple tumoral regions for breast tumor classification with ultrasound images. Medical physics (Lancaster). 2022;49(1):144–57. pmid:34766623
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Guo M, Wang Y, Yang Q, Li R, Zhao Y, Li C, et al. Normal Workflow and Key Strategies for Data Cleaning Toward Real-World Data. Interactive Journal of Medical Research. 2023;12(1):e44310.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Willemink MJ, Koszek WA, Hardell C, Wu J, Fleischmann D, Harvey H, et al. Preparing medical imaging data for machine learning. Radiology. 2020;295(1):4–15. pmid:32068507
View Article
PubMed/NCBI
Google Scholar

[5] View Article

[6] PubMed/NCBI

[7] Google Scholar

[ref3] 3. Șerbănescu M-S, Rotaru-Zăvăleanu A-D, Istrate-Ofițeru A-M, Maria BE-I-A, Enache I-A, Nagy RD, et al., editors. Medical Image Data Cleansing for Machine Learning: A Must in the Evidence-Based Medicine? International Conference on Advancements of Medicine and Health Care through Technology; 2022: Springer.

[ref4] 4. Shen Y, Shamout FE, Oliver JR, Witowski J, Kannan K, Park J, et al. Artificial intelligence system reduces false-positive findings in the interpretation of breast ultrasound exams. Nature Communications. 2021;12(1). pmid:34561440
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref5] 5. Shamout FE, Shen Y, Witowski JS, Oliver JR, Kannan K, Wu N, et al., editors. The NYU Breast Ultrasound Dataset v1.02021.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Bradski G. The opencv library. Dr Dobb’s Journal: Software Tools for the Professional Programmer. 2000;25(11):120–3.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref7] 7. Clark A. Pillow (pil fork) documentation. readthedocs. 2015.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref8] 8. Wallis D, Buvat I. Clever Hans effect found in a widely used brain tumour MRI dataset. Medical image analysis. 2022;77:102368. pmid:35063892
View Article
PubMed/NCBI
Google Scholar

[23] View Article

[24] PubMed/NCBI

[25] Google Scholar

[ref9] 9. Kovács DP, McCorkindale W, Lee AA. Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias. Nature communications. 2021;12(1):1695. pmid:33727552
View Article
PubMed/NCBI
Google Scholar

[27] View Article

[28] PubMed/NCBI

[29] Google Scholar

[ref10] 10. Bottani S, Burgos N, Maire A, Saracino D, Ströer S, Dormont D, et al. Evaluation of MRI-based machine learning approaches for computer-aided diagnosis of dementia in a clinical data warehouse. Medical Image Analysis. 2023;89:102903. pmid:37523918
View Article
PubMed/NCBI
Google Scholar

[31] View Article

[32] PubMed/NCBI

[33] Google Scholar

[ref11] 11. Baek J, O’Connell AM, Parker KJ. Improving breast cancer diagnosis by incorporating raw ultrasound parameters into machine learning. Machine Learning: Science and Technology. 2022;3(4):045013. pmid:36698865
View Article
PubMed/NCBI
Google Scholar

[35] View Article

[36] PubMed/NCBI

[37] Google Scholar

[ref12] 12. Gu Y, Xu W, Lin B, An X, Tian J, Ran H, et al. Deep learning based on ultrasound images assists breast lesion diagnosis in China: a multicenter diagnostic study. Insights into Imaging. 2022;13(1). pmid:35900608
View Article
PubMed/NCBI
Google Scholar

[39] View Article

[40] PubMed/NCBI

[41] Google Scholar

[ref13] 13. Kim J, Kim HJ, Kim C, Lee JH, Kim KW, Park YM, et al. Weakly-supervised deep learning for ultrasound diagnosis of breast cancer. Scientific Reports. 2021;11(1). pmid:34934144
View Article
PubMed/NCBI
Google Scholar

[43] View Article

[44] PubMed/NCBI

[45] Google Scholar

[ref14] 14. Zhang Q, Xiao Y, Dai W, Suo J, Wang C, Shi J, et al. Deep learning based classification of breast tumors with shear-wave elastography. Ultrasonics. 2016;72:150–7. Epub 20160806. pmid:27529139.
View Article
PubMed/NCBI
Google Scholar

[47] View Article

[48] PubMed/NCBI

[49] Google Scholar

[ref15] 15. Zhang Q, Song S, Xiao Y, Chen S, Shi J, Zheng H. Dual-mode artificially-intelligent diagnosis of breast tumours in shear-wave elastography and B-mode ultrasound using deep polynomial networks. Medical engineering & physics. 2019;64:1–6. pmid:30578163
View Article
PubMed/NCBI
Google Scholar

[51] View Article

[52] PubMed/NCBI

[53] Google Scholar

[ref16] 16. Fujioka T, Katsuta L, Kubota K, Mori M, Kikuchi Y, Kato A, et al. Classification of Breast Masses on Ultrasound Shear Wave Elastography using Convolutional Neural Networks. Ultrasonic imaging. 2020;42(4–5):213–20. pmid:32501152
View Article
PubMed/NCBI
Google Scholar

[55] View Article

[56] PubMed/NCBI

[57] Google Scholar

[ref17] 17. Zhang X, Liang M, Yang Z, Zheng C, Wu J, Ou B, et al. Deep Learning-Based Radiomics of B-Mode Ultrasonography and Shear-Wave Elastography: Improved Performance in Breast Mass Classification. Frontiers in oncology. 2020;10:1621–. pmid:32984032
View Article
PubMed/NCBI
Google Scholar

[59] View Article

[60] PubMed/NCBI

[61] Google Scholar

[ref18] 18. D’Orsi C, Bassett L, Feig S. Breast imaging reporting and data system (BI-RADS). Breast imaging atlas, 4th edn American College of Radiology, Reston. 2018.
View Article
Google Scholar

[63] View Article

[64] Google Scholar

[ref19] 19. EasyOCR: JaidedAI; 2023. Available from: https://github.com/JaidedAI/EasyOCR.

[ref20] 20. Lee SE, Lee E, Kim E-K, Yoon JH, Park VY, Youk JH, et al. Application of Artificial Intelligence Computer-Assisted Diagnosis Originally Developed for Thyroid Nodules to Breast Lesions on Ultrasound. Journal of Digital Imaging. 2022. pmid:35902445
View Article
PubMed/NCBI
Google Scholar

[67] View Article

[68] PubMed/NCBI

[69] Google Scholar

[ref21] 21. Bong JH, Kim TH, Jeong S. Deep learning model for the diagnosis of breast cancers smaller than 1 cm with ultrasonography: integration of ultrasonography and clinical factors. Quantitative Imaging in Medicine and Surgery. 2023;13(4):2486–95. pmid:37064369
View Article
PubMed/NCBI
Google Scholar

[71] View Article

[72] PubMed/NCBI

[73] Google Scholar

[ref22] 22. Al-Dhabyani W, Gomaa M, Khaled H, Fahmy A. Dataset of breast ultrasound images. Data in brief. 2020;28:104863. pmid:31867417
View Article
PubMed/NCBI
Google Scholar

[75] View Article

[76] PubMed/NCBI

[77] Google Scholar

[ref23] 23. Chowdary J, Yogarajah P, Chaurasia P, Guruviah V. A Multi-Task Learning Framework for Automated Segmentation and Classification of Breast Tumors From Ultrasound Images. Ultrasonic Imaging. 2022;44(1):3–12. pmid:35128997
View Article
PubMed/NCBI
Google Scholar

[79] View Article

[80] PubMed/NCBI

[81] Google Scholar

[ref24] 24. Chen G, Li L, Dai Y, Zhang J, Yap MH. AAU-net: An Adaptive Attention U-net for Breast Lesions Segmentation in Ultrasound Images. IEEE Transactions on Medical Imaging. 2022:1-. pmid:36455083
View Article
PubMed/NCBI
Google Scholar

[83] View Article

[84] PubMed/NCBI

[85] Google Scholar

[ref25] 25. Du R, Chen Y, Li T, Shi L, Fei Z, Li Y. Discrimination of Breast Cancer Based on Ultrasound Images and Convolutional Neural Network. Journal of Oncology. 2022;2022:1–9. pmid:35345516
View Article
PubMed/NCBI
Google Scholar

[87] View Article

[88] PubMed/NCBI

[89] Google Scholar

[ref26] 26. Cui W, Peng Y, Yuan G, Cao W, Cao Y, Lu Z, et al. FMRNet: A fused network of multiple tumoral regions for breast tumor classification with ultrasound images. Medical physics (Lancaster). 2022;49(1):144–57. pmid:34766623
View Article
PubMed/NCBI
Google Scholar

[91] View Article

[92] PubMed/NCBI

[93] Google Scholar

Figures

Abstract

Introduction

Materials and methods

Scan filtering

Knowledge extraction

Artifact detection

Scan area cropping

Development dataset

Internal test dataset

Case study dataset

Statistical analysis

Results

Internal test dataset

Performance

Discussion

Case study

Limitations

Conclusions

Acknowledgments

References