Fig 1.
(a) Qualified – fiber: 6.0, specks: 4.0, mass: 4.0 (b) Unqualified – fiber: 1.5, specks: 4.0, mass: 1.5.
Table 1.
Dataset split and qualification status for model training, test, and evaluation.
Table 2.
Distribution of lesion scores by feature type across training, validation, and evaluation datasets.
Fig 2.
Grid-level and group-level score distribution of mammography phantom images.
Fig 3.
Conversion from monochrome1 to monochrome2 with background standardization.
Fig 4.
Phantom detection and alignment via thresholding and contour detection.
(a) An original image. (b) An original image’s contour is detected with an Otsu method with threshold of 97 (blue). The center of mass of the region is obtained and its closest side is determined (red). (c) Rotation is applied to correct phantom orientation. (d) The image is cropped to the region of interest.
Fig 5.
Lesion area isolation and precise cropping.
(a) Only the region containing the gel is selected by analyzing the line profile into (b) the selected phantom image.
Fig 6.
Normalization of pixel intensities and image standardization.
(a) Pixel intensity of the rhombus was chosen for window adjustment to neglect any outlier such as the disk in the image. (b) The maximum and minimum value of the rhombus were chosen for normalization. (c) Normalized image.
Fig 7.
Visualization of the full workflow.
Fig 8.
Confusion matrix of model predictions on the evaluation dataset for (a) fiber, (b) specks, and (c) mass.
Accuracy, precision, and recall values are shown for each score class.
Fig 9.
Receiver operating characteristic (ROC) curves and corresponding area under the curve (AUC) of the model prediction at the evaluation dataset for (a) fiber, (b) specks, and (c) mass.
The existence of a lesion, and abnormality insufficient for a 1.0-score classification was presented as orange and green line, respectively. The threshold for each output is chosen to maximize the F1 score. Subimages with 0.0-score were excluded when plotting the abnormality ROC curve.
Fig 10.
Confusion matrix of pass/fail predictions by feature.
Each of the feature groups: (a) fiber, (b) specks, (c) mass, and (d) all three features are scored by the model, and validated as pass or fail. Only images that meet the requirements for all three features are considered qualified.
Fig 11.
Visualization of (a) the phantom image and (b) corresponding saliency map.
The annotations in the top-left corner of each subimage in (a) represent the labels while those in (b) represent the predictions, where 0, 1, and 2 correspond to scores of 0.0, 0.5, and 1.0, respectively. The saliency map is visualized via heatmap with the maximum value consistent across the same feature.
Fig 12.
The model ignores artifacts and concentrates on the fiber itself.
(a) A subimage of a fiber with artifacts such as letters and discs and (b) a masked subimage were scored by the model. The prediction and the saliency map were consistent before and after masking the artifacts, despite having been trained on images containing artifacts.
Fig 13.
Model interpretation for fiber scoring based on visible length.
(a) Saliency maps and class probabilities as the fiber is masked at different break positions; central breaks trigger the highest abnormality detection. (b) Saliency maps with corresponding class probabilities as fiber length increases; once a sufficient length is visible, the model assigns a score of 1.0 (no abnormality). Solid lines indicate predictions above the decision threshold, whereas dotted lines indicate predictions below it.
Fig 14.
Model interpretation for speck scoring, with predictions grouped as “Existence” (lesion presence) and “Abnormality” (lesions insufficient for full-point qualification).
Saliency maps and class probabilities are shown as the number of visible specks increases from 1 to 6 and speck visibility increases from 0 to 1. The model predicts 0.5 or 1.0-score at ≥2 visible specks, and 1.0-score at ≥3–4 specks. Solid lines indicate predictions above the decision threshold, whereas dotted lines indicate predictions below it.
Fig 15.
Model interpretation for mass scoring.
(a) Saliency maps show correct shape localization but a size bias: larger lesions predicted as 1.0-score, smaller as 0.5-score despite identical label. (b) Probability of abnormality increases as the image is deformed into a square shape. Both sizes show increasing abnormality with deformation, with the smaller lesion consistently yielding higher probabilities. Final probabilities converge once deformation is complete. Solid lines indicate predictions above the decision threshold, whereas dotted lines indicate predictions below it.