Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Improving detection accuracy of heterogeneity in biological tissues through the combination of modulation-demodulation frame accumulation techniques and enhanced vgg16

  • Fulong Liu ,

    Roles Data curation, Formal analysis, Methodology, Project administration, Resources, Software, Visualization, Writing – original draft

    100002022053@xzhmu.edu.cn

    Affiliation Xuzhou Medical University, School of Medical Information and Engineering, Xuzhou, Jiangsu, China

  • Siyuan Huang,

    Roles Data curation, Methodology, Writing – original draft

    Affiliation Xuzhou Medical University, School of Medical Information and Engineering, Xuzhou, Jiangsu, China

  • Jie Gao,

    Roles Data curation, Software, Writing – original draft

    Affiliation Xuzhou Medical University, School of Medical Information and Engineering, Xuzhou, Jiangsu, China

  • Xin Zhou,

    Roles Data curation, Software, Writing – review & editing

    Affiliation Xuzhou Medical University, School of Medical Information and Engineering, Xuzhou, Jiangsu, China

  • Junqi Wang

    Roles Conceptualization, Formal analysis, Resources, Software, Writing – review & editing

    Affiliation Xuzhou Medical University, School of Medical Information and Engineering, Xuzhou, Jiangsu, China

Abstract

Light source has obvious absorption and scattering effects during the transmission process of biological tissues, making it difficult to identify heterogeneities in multi-spectral images. This paper achieves a gradual improvement in the classification accuracy of heterogeneities on multi-spectral transmission images (MTI) through the combination of modulation-demodulation frame accumulation (M_D-FA) techniques and enhanced Visual Geometry Group 16 (VGG16) models. Firstly, experiments are designed to collect MTI of phantoms. Then, the image is preprocessed by different combinations of frame accumulation (FA) and modulation and demodulation (M_D) techniques. Finally, multi-spectral fusion pseudo-color images obtained from U-Net semantic segmentation are inputted into the original and enhanced VGG16 network models for heterogeneous classification. The experimental results show that: While both FA and M_D significantly improved the image quality individually, their combination (M_D-FA) proved superior, yielding the highest signal-to-noise ratio (SNR) and the most accurate heterogeneous classification. Compared to the original VGG16 model, the enhanced VGG16 models gradually improved the classification accuracy. Most importantly, the 3.5 Hz M_D-FA images processed by the Visual Geometry Group 16-Batch Normalization-Squeeze and Excitation-Global Average Pooling (VGG16_BN_SE_GAP) model achieved the highest classification accuracy of 97.57%, significantly outperforming results using FA or M_D alone. In summary, this paper utilizes different combinations of FA and M_D techniques to further improve the accuracy of deep learning networks on multi-spectral images heterogeneous classification, which promotes the clinical application of multi-spectral transmission imaging technology in early breast cancer detection.

Introduction

Breast cancer has become the most common malignancy among women globally, having surpassed lung cancer, with an estimated 2.3 million new cases (11.7% of all cancer diagnoses) reported in 2020 [1]. In China, breast cancer is the leading cause of cancer-related morbidity and mortality among women, with its age-standardized rate rising by 3.3% annually over the past decade [2,3]. Regular screening is an effective method for early detection of tumors and improving patient prognosis. Early treatment of breast cancer not only preserves female breast tissue but also significantly improves the cure rate compared to patients diagnosed in the intermediate or advanced stages [4]. However, the physical structure of early breast tumors is often indistinct in medical images, making it difficult to identify and locate tumors within breast tissue in terms of position and size. Moreover, existing clinical examination techniques struggle to simultaneously meet the characteristics required for early detection of breast tumors, such as regularity, non-radiation, low cost, convenience and ease of implementation. For example, X-rays commonly used in clinical practice cannot be used to screen for early breast cancer, mainly because when X-rays irradiate human tissues, they will react with substances in the tissues, destroy the cellular structure of human tissues, and cause permanent damage to human tissues. Additionally, X-ray examinations have relatively low sensitivity for dense breast tissue in younger women [5]. Ultrasound imaging lacks standardized techniques and takes a long time, and the diagnostic ability depends on the experience and ability of technicians [6]. Computed Tomography (CT) takes a long time to scan, and only a limited number of layers can be scanned during the effective time of contrast agent, which cannot guarantee spatial resolution of images [7]. Magnetic Resonance Imaging (MRI) is slow, costly and insensitive to calcifications and cortical bone lesions, posing challenges for quantitative diagnosis [8].

In recent years, optical imaging has gradually become a research hotspot and has been widely applied in many fields. In comparison to conventional clinical imaging methods, optical imaging methods possess the following significant advantages [9]: a) The use of safe, non-ionizing radiation for non-invasive tissue detection. b) The ability to display soft tissue contrast based on optical properties. c) Potential for continuous monitoring of tissue lesions. d) High spatial resolution (with lateral resolution less than 1 micron in the visible range). Moreover, breast tissue is semi-transparent and highly transmissibility. During optical transmission imaging, tumor-associated neovascularization and elevated hemoglobin concentrations alter the optical properties (e.g., absorption and scattering coefficients) of breast tissue. These changes manifest as localized shadows in transmission images, termed heterogeneities [10]. For instance, malignant tissues exhibit higher absorption at specific wavelengths (e.g., green and near-infrared) due to dense microvasculature, whereas normal tissues show uniform transmission patterns. Our phantom experiments simulate these optical disparities using materials with controlled absorption/scattering profiles (e.g., potato blocks for low-density regions, pumpkin blocks for high-density anomalies), enabling validation of the method’s capability to detect tumor-mimicking structures. Therefore, optical transmission imaging provides a clinically non-invasive detection method for screening early-stage breast cancer.

Multi-spectral non-destructive transmission optical imaging has become a research hotspot due to its real-time, non-invasive, safe, specific and highly sensitive advantages, and has been widely applied in many fields [11,12]. However, there is relatively limited research on the application of multi-spectral transmission images (MTI) in medical field. This is mainly due to the absorption and scattering characteristics of tissues, which strictly limit the transmission depth of light sources. During the transmission process, light is absorbed by water, macromolecules (such as proteins) and pigments (such as melanin, hemoglobin) in biological tissues without being re-emitted, and the photons are lost, causing the image to become dim. The presence of these components restricts the propagation of light, making it difficult to obtain high-information images. Currently, modulation-demodulation (M_D) and frame accumulation (FA) technologies have become the most effective methods to enhance low-light signals in the process of obtaining MTI with various low-light level image detection devices. Li et al significantly improved the signal-to-noise ratio (SNR) and resolution of low-light images using FA and shaping signal techniques [1315]. In 2019, Zhang et al reported the effectiveness of combining FA with deep learning frameworks (e.g., Faster R-CNN and SSD) for multispectral heterogeneity detection, achieving over 99.9% mean Average Precision (mAP) in simplified two-class scenarios [16]. Additionally, their earlier work introduced a joint preprocessing algorithm integrating FA and edge enhancement, which improved the peak signal-to-noise ratio (PSNR) to 57.3 dB and significantly enhanced edge contrast for transmission tissue images [17]. In 2020, our team proposed a method that combines modulation-demodulation-frame accumulation technique (MDFAT), spatial pyramid matching (SPM) model and deep learning to further improve the accuracy of heterogeneous recognition in multi-spectral images while enhancing the low-light image information content [18].

Additionally, with the advancement of machine learning and hardware capabilities, deep learning-based image object classification methods have been widely applied. Deep learning extracts high-level abstract features from images through layer-by-layer convolution and uncovers hidden properties within targets, which will facilitate the development of MTI in early breast tumor detection, bringing new possibilities to medical diagnosis [1921]. In 2019, Ting et al developed a new algorithm called convolutional neural network (CNN) improved breast cancer classification (CNNI-BCC) using digital X-ray images [22]. Shen et al developed an end-to-end deep learning algorithm that can accurately detect breast cancer in screening mammograms while eliminating the reliance on rare lesion annotations. Moreover, this study shows that classifiers based on Visual Geometry Group 16 (VGG16) and ResNet can complement each other and preserve the full resolution of digital mammography images [23]. SanaUllah et al used the concept of transfer learning to propose a new deep learning framework for the detection and classification of breast cancer in breast cytological images. In the proposed framework, a pre-trained CNN architecture, namely GoogLeNet, Visual Geometry Group Network (VGGNet) and Residual Networks (ResNet), was used to extract features from images and feed them into the fully connected layer. Malignant and benign cells were classified using average pooling classification [24]. In 2022, Montaha et al propose a BreastNet18 model based on fine-tuned VGG16 that changes different hyperparameters and layer structures. After the ablation study of the proposed model and the selection of appropriate parameter values for the pre-processing algorithm, the accuracy of the model is superior to some of the most advanced methods available [25]. In 2023, Alexandru et al compared the efficiency of six state-of-the-art, fine-tuned deep learning models (ResNet-50, Inception-V3, Inception-ResNet-V2, MobileNet-V2, VGG-16 and DenseNet-121). These models can use transfer learning to classify breast tissue in ultrasound images into benign, malignant and normal categories. Among them, VGG-16 model obtained a good detection accuracy [26]. In 2024, Sreedhar et al proposed a hybrid CNN model combining ResNet-34, FE-VGG-16 and M-AlexNet to classify histopathological images of breast lesions in a standard dataset as benign or malignant to improve diagnostic accuracy [27]. In summary, the VGG16 network, as a deep convolutional neural network, efficiently explores the information within transmission images and identifies potential anomalous regions [28]. Moreover, the VGG16 network model demonstrates excellent performance on large-scale image datasets and can further enhance its performance through fine-tuning. Additionally, VGG16 exhibits strong generalization capabilities, making it suitable for various sizes and types of image data. Therefore, this paper selects the VGG16 network as the foundational model for heterogeneous classification and detection on MTI.

This paper achieves a gradual improvement in the classification accuracy of heterogeneities on MTI by first demonstrating that the synergistic combination of modulation-demodulation and frame accumulation (M_D-FA) techniques provides a greater benefit than either preprocessing method alone, and then by applying these optimized images to enhanced VGG16 models. Specifically, experiments are designed to collect MTI of phantoms, and various combinations (FA, M_D, and M_D-FA) are applied for image preprocessing to rigorously compare their efficacy. The multi-spectral fusion pseudo-color images obtained from U-Net semantic segmentation are then inputted into both the original and enhanced VGG16 network models for heterogeneous classification. Both FA and M_D techniques significantly improve the image quality, and the enhanced VGG16 models gradually increase the classification accuracy of heterogeneities on transmission images compared to the original VGG16 model. In conclusion, this paper further improves the accuracy of deep learning networks in multi-spectral images heterogeneous classification by utilizing different combinations of FA and M_D techniques, building upon the enhanced image quality. The framework of heterogeneous classification detection model in biological tissues is illustrated in Fig 1.

thumbnail
Fig 1. The overall classification model framework diagram.

https://doi.org/10.1371/journal.pone.0329009.g001

Related method

U-Net network

In medical image segmentation tasks, U-Net is one of the most successful methods. Unlike fully convolutional networks (FCN), the key differences in U-Net lie in its encoder and skip connection parts. The network structure of U-Net is fully symmetric, with similar structures on the left and right sides. U-Net fully transfers the features extracted by the decoder part to the encoder part in the skip connection part, while using concatenation operations to merge features [29]. The architecture of U-Net is shown in Fig 2. Firstly, detailed and contour information of image are obtained in the encoder part. Then, the extracted features are passed to the decoder part through the skip connection stage. Finally, the decoder part combines features from multiple scales for feature restoration. The widespread application of U-Net in the field of medical image segmentation is attributed to: (1) its ability to achieve decent results with minimal data for training. (2) the effectiveness of its network structure.

Enhanced VGG16 network

The VGG16 network architecture is shown in Fig 3 [30]. Its main characteristics are as follows: the convolutional layers use convolutional kernels with consistent parameters to ensure that the width and height of tensors are consistent between each convolutional layer. All pooling kernel parameters used in the pooling layer are the same, and the size is 2 × 2, stride is 2. To halve the length and width of the size after the pooling operation and reduce the number of parameters. Small convolutional kernels are stacked instead of large ones to achieve the same receptive field. For example, two 3 × 3 convolutional kernels replace a 5 × 5 convolutional kernel with the same receptive field, and three 3 × 3 convolutional kernels replace a 7 × 7 convolutional kernel with the same receptive field. Therefore, when the feature extraction effect is similar, multiple small volume kernels have fewer learning parameters than large convolutional kernels. It can also improve model performance while increasing network depth.

Batch Normalization (BN) layer after convolution.

The essence of the learning process in convolutional neural networks (CNNs) is to learn the data distribution. As the depth of network increases, more information can be extracted, leading to more precise classification results. However, this progress also comes with risks. With the increase in the number of convolutional layers, the training duration also becomes longer. This poses a higher risk of overfitting, resulting in poor model recognition performance and reduced network generalization ability. Moreover, the dataset is divided into multiple batches for training rather than being trained all at once during model training. When there are significant differences in the distribution of each batch of training data, the network needs to adapt to different distributions during each training iteration, thus slowing down the training process. To address these issues, the values outputted by each convolutional layer are normalized. This normalization ensures relative stability in the distribution of input data between adjacent convolutional layers, stabilizing the network training process and improving both network generalization ability and training speed [31].

Squeeze and Excitation (SE) attention module.

The SE attention module proposed by Hu et al is one of the mainstream attention mechanisms currently [32]. It is a channel attention mechanism that calculates the weights of different feature channels in the input image through the network. This allows for the correction of feature channels, emphasizing useful information while filtering out irrelevant information, thereby enhancing the feature representation capability. Additionally, this module is lightweight and can be easily integrated into models, typically added after convolutional blocks. It only introduces a small amount of model complexity and computational overhead. The SE attention module mainly consists of two parts: Squeeze and Excitation, as illustrated in Fig 4. In the figure, represents the convolution operation, represents the input, represents the output, represent the number of channels, width and height of image, respectively. represent the number of channels, width and height of the previous image. The convolution formula is shown in Equation (1)

(1)

Where represents the output feature map, represents the convolutional kernel, represents the two-dimensional spatial kernel in the convolutional kernel, and represents the input feature map. The generated feature maps undergo a compression operation via a global pooling layer, denoted as , resulting in real numbers with dimensions of 1 × 1 as shown in Equation (2). To capture correlations between channels, the compressed real numbers undergo an activation operation denoted as , using as the parameters to generate weights for all channels. The activation operation is illustrated in Equation (3).

(2)(3)

Where represents the compressed real numbers, represents the sigmoid activation function, represents the ReLU activation function, represents the parameter for dimension reduction. Finally, the output of the activation layer is used as a parameter to measure the importance of each channel. This parameter is added to the original features of each channel through the operation, achieving the re-scaling of the original features weights. The formula is shown in Equation (4).

(4)

Where represents the re-scaled feature maps, represents the parameter generated by the activation layer. Squeeze, as depicted in the figure , refers to the global compression feature quantity of the current feature map obtained by performing global pooling on the Feature Map layer. Excitation, as illustrated in the figure , refers to obtaining the weights of each channel in the feature map through two fully connected layers, and then using the weighted feature map as the input to the next layer of the network. In image classification tasks, models often experience a decrease in overall accuracy due to the omission of small and weak targets. To address this issue, this paper embeds SE attention modules after each convolutional layer in the network, as shown in Fig 5. On the one hand, this has little impact on the overall parameter quantity of model. Compared to the original model, the addition of SE attention modules only increases the parameter quantity and model size by 0.05%. On the other hand, it enhances the feature extraction capability of the network, thereby improving the model’s classification accuracy.

Replacing the Global Average Pooling (GAP) layer.

The VGG16 model consists of 13 convolutional layers and 3 fully connected layers, with a total parameter count of 136,349,440. The parameters of the three fully connected layers account for 89.21% of the total, resulting in excessive computational resources consumption, reduced model learning speed, increased risk of overfitting, decreased model generalization ability, and compromised real-time detection accuracy. To address the negative impact of the fully connected layers, GAP layer is introduced. By replacing the three fully connected layers in VGG16 with GAP layer, the parameter count is drastically reduced, making the model more robust and reducing the risk of overfitting. After replacing the fully connected layers with GAP layer, both the parameter count and model size decreased by 87.51%.

Experiment

Experimental equipment

The experimental system setup is shown in Fig 6. The system mainly consists of a power supply, a modulator module, a 4 × 4 array of LED, a phantom, an industrial camera, a computer (HuiPu) and a black cloth. The power supply is a programmable direct current (DC) stabilized power supply, model hspy-600. The LED array includes wavelengths of 435nm blue light, 546nm green light, 700nm red light and 860nm near-infrared light. The illumination angle of the LED array must ensure coverage of the entire phantom area and the LED light intensity range varies sinusoidally. To simulate breast cancer scenarios based on the distinct properties of breast tissue—namely its high transmittance and tomographic distribution—the phantom is constructed using a rectangular container made of highly translucent polymethyl methacrylate (PMMA) material, chosen to replicate the tissue’s superior light transmission compared to other biological tissues. The container houses solutions of fat emulsion at three concentrations (2%, 3%, and 5%) and six heterogeneous masses (two potato blocks, two carrot blocks, and two pumpkin blocks) with controlled size variations (0.7 cm × 0.7 cm × 1 cm) to mimic tumors in breast tissue. These vegetables are selected due to their optical and structural congruence with breast heterogeneities, enabling realistic simulation of light scattering and absorption patterns. The heterogeneities are positioned at two-thirds of the phantom’s width to emulate typical tumor locations, while the experimental setup includes a light source placed directly in front of the phantom and an industrial camera positioned behind it to capture transmitted light signals. According to the concentration of the fat emulsion, the gain of industrial camera (model: JHSM120Bf, frame rate/resolution 29.4fps@1280x960, spectral response: 390–1030nm) is set to 3, 7 and 10, the exposure time is set to 5ms, 8ms and 10ms, and the sampling rate is set to 45 frames per second. The entire experiment is conducted under a black cloth.

The schematic diagram of the modulator module circuit is shown in Fig 7, which utilizes a square wave to sine wave conversion circuit to generate the required 3.5 Hz/4 Hz sinusoidal signal, as illustrated in Fig 8. Fig 7a depicts the circuit schematic for square wave to sine wave conversion, where CD4060 serves as a 14-bit binary serial counter/divider, producing a high-precision square wave signal of 3.5/4 Hz at its 13th pin (Q9 pin). The precision of its output signal is fine-tuned by C3. The square wave signal, after attenuation via gain control network R2 and R3, enters the low-pass filtering circuit, retaining only the fundamental frequency signal, thereby obtaining the sine wave signal at the same frequency. The I/V conversion circuit employs the CMOS process integrated chopper-stabilized zero-drift operational amplifier ICL7650 from Maxim Integrated, as shown in Fig 7b. R1 acts as the input current-limiting protection resistor for ICL7650. The small resistors R2, R3 and R4 form a T-network, replacing the traditional use of large resistors to enhance gain stability and accuracy, and reduce noise. Additionally, for effective amplification of micro-current signals in the I/V conversion section, an inverted input type amplification circuit with a T-network is adopted, as shown in Fig 7c, producing an amplified voltage signal with a phase opposite to the input current signal.

thumbnail
Fig 7. Schematic diagram of sinusoidal forming signal generating circuit.

(a) Schematic diagram of square wave to sine wave circuit; (b) Schematic diagram of I/V conversion circuit; (c) Schematic diagram of two-stage amplifier circuit.

https://doi.org/10.1371/journal.pone.0329009.g007

thumbnail
Fig 8. Modulation frequency diagram of shaped signal.

(a) Frequency domain diagram of 3.5 Hz; (b) Frequency domain diagram of 4 Hz.

https://doi.org/10.1371/journal.pone.0329009.g008

Image acquisition and preprocessing

The original MTI are collected on the built experimental platform and the original image is processed by FA, M_D M_D-FA and SPM. The specific process is as follows:

  1. Acquisition of Original Multi-Spectral Image Sequence. Four LED arrays loaded with 3.5 Hz and 4 Hz sinusoidal signal wavelengths are used to illuminate the phantom respectively, and the original MTI are obtained. Each wavelength LED array irradiates the phantom with five different concentrations (2 sets of 2% concentration fat emulsion, 2 sets of 3% concentration fat emulsion, and 1 set of 5% concentration fat emulsion). A total of 60 sets of original and modulated image data are obtained at all wavelengths, each set including 1,200 frames of images, for a total of 72,000 frames of MTI.
  2. Preprocessing of Original Multi-Spectral Image Sequences. The collected MTI are processed by FA, M_D M_D-FA and respectively. a) FA processing of images. Taking a single set of multi-spectral images at near-infrared wavelength as an example, the average gray value of 1,200 low-light level images is obtained, as shown in Fig 9. It can be seen from the figure that the single cycle of the sinusoidal signal includes 12 frames of images, and each 12 frames of images is processed in order to obtain the FA image of all wavelengths, and a total of 2,000 frames FA images are obtained. b) Image M_D processing. All the modulated images obtained are subjected to fast Fourier transform (FFT) to obtain the frequency coordinates of the image loading sinusoidal signal, as shown in Fig 10. Fig 10a corresponds to the frequency domain coordinates of 4 Hz, and Fig 10b corresponds to the frequency domain coordinates of 3.5 Hz. According to the frequency domain coordinates, the multi-spectral images of all wavelengths are demodulated, and a total of 48,000 frames at different frequencies are obtained. c) Image M_D-FA processing. Similarly, the images demodulated at different frequencies are subjected to FA averaging processing based on each set of 12 consecutive frames within a single sinusoidal signal cycle. This process yields the M_D-FA images for all wavelengths, resulting in a total of 4,000 frames of images.
thumbnail
Fig 9. Period diagram of near infrared light sinusoidal signal.

https://doi.org/10.1371/journal.pone.0329009.g009

thumbnail
Fig 10. 4 Hz and 3.5 Hz frequency domain coordinate diagram.

(a) Frequency domain coordinate diagram corresponding to 4 Hz; (b) Frequency domain coordinate diagram corresponding to 3.5 Hz.

https://doi.org/10.1371/journal.pone.0329009.g010

Image U-Net network semantic segmentation

  1. MTI U-Net Semantic Segmentation. Firstly, combine the four-wavelength original MTI according to the proportions of an RGB color image to obtain a pseudo-color image, as shown in Fig 11a. Then, input the obtained pseudo-color image into the U-Net semantic segmentation network model for training, obtaining semantic segmentation annotations for the six heterogeneities in pseudo-color image, as shown in Fig 11b. Finally, acquire the original segmentation images of six heterogeneities through mask processing, as shown in Fig 11c.
  2. Obtaining Pseudo-Color Images. Based on the proportional relationships of RGB primary colors in color image, recombine segmentation images to obtain original pseudo-color images, FA pseudo-color images, M_D pseudo-color images and M_D-FA pseudo-color images. The four-wavelength images are combined in the order of blue, green, red and near-infrared light, with each combination comprising every 3 wavelengths, resulting in a total of combinations. This yields 144,000 frames of original pseudo-color images, 12,000 frames of FA pseudo-color images, 288,000 frames of M_D pseudo-color images, and 24,000 frames of M_D-FA pseudo-color images. The obtained pseudo-color images are divided successively according to the mask region, and the 6 different heterogeneities are shown in Fig 12. While the pseudo-color images in Fig 12 may appear visually similar at first glance, a closer examination reveals that the heterogeneities exhibit distinct spectral and morphological characteristics. These differences are highlighted in the zoomed-in insets added to each panel of Fig 12. For instance, carrot blocks show stronger absorption in the red channel (700nm), resulting in darker regions in the fused images (Fig 12b and 12e, see insets). In contrast, pumpkin blocks display higher near-infrared (860nm) reflectance, visible as brighter patches (Fig 12 a and d, see insets). Potato cubes have the best light transmittance and the brightest images, with a more uniform texture (Fig 12c and 12f, see insets). These spectral differences, combined with edge/texture features extracted by the Geometry Group 16-Batch Normalization-Squeeze and Excitation-Global Average Pooling (VGG16_BN_SE_GAP) model (e.g., conv3_3 layer in Fig 13, enable accurate classification. The FA and M_D preprocessing further enhance these discriminative features by reducing noise and amplifying weak signals, as evidenced by the SNR improvements in Table 1.
thumbnail
Table 1. Comparison of SNR and PSNR for different wavelength images under different preprocessing methods.

https://doi.org/10.1371/journal.pone.0329009.t001

thumbnail
Fig 11. U-Net network semantic segmentation process.

(a) Original pseudo-color image; (b) Semantic segmentation map of six different heterogeneities; (c) Mask segmentation of original image.

https://doi.org/10.1371/journal.pone.0329009.g011

thumbnail
Fig 12. Six types of heterogeneous pseudo-color images after U-Net network segmentation.

(a) Pseudo-color image of heterogeneity1; (b) Pseudo-color image of heterogeneity2; (c) Pseudo-color image of heterogeneity3; (d) Pseudo-color image of heterogeneity4; (e) Pseudo-color image of heterogeneity5; (f) Pseudo-color image of heterogeneity6. Red boxes on the main images and the corresponding zoomed-in insets highlight the representative regions of interest (ROIs), showcasing the distinct textural and contrast features discussed in the text.

https://doi.org/10.1371/journal.pone.0329009.g012

thumbnail
Fig 13. Original image and convolutional feature maps.

(a) Pseudo-color image of heterogeneous entities; (b) Feature map of conv1_2; (c) Feature map of conv2_2; (d) Feature map of conv3_3; (e) Feature map of conv4_3; (f) Feature map of conv5_3.

https://doi.org/10.1371/journal.pone.0329009.g013

VGG16 heterogeneous network detection model

  1. Dataset Creation. The six types of pseudo-color images combined after multi-spectral semantic segmentation are respectively made into original dataset a, FA dataset b, 4 Hz M_D dataset c, 3.5 Hz M_D dataset d, 4 Hz M_D-FA dataset e and 3.5 Hz M_D-FA dataset f. The images in dataset are divided into six categories: potato block 1, potato block 2, carrot block 1, carrot block 2, pumpkin block 1 and pumpkin block 2. Datasets a, b, c, d, e and f are randomly divided into training sets, verification sets and test sets using random functions, and the ratio is set to 6:1:3 according to the traditional partition ratio in the field of machine learning.
  2. Model Training. In this study, VGG16, VGG16_BN, VGG16_BN_SE and VGG16_BN_SE_GAP networks are chosen as heterogeneous classification models. The datasets a, b, c, d, e and f are input into both the original and enhanced VGG16 networks according to the designated proportions for training. The optimal models for different datasets are established by comparing the accuracy and F-score of models. For the VGG16 network, the batch size is set to 32, Adam optimizer is utilized with an initial learning rate of 0.0001, a momentum factor of 0.9, weight decay of 0.0005 and 30 iterations with a training step size of 100. Additionally, the parameters generated by ImageNet pre-training for feature extraction are retained during training. In the SE unit module, the scaling parameter r is set to 16. To prevent overfitting, dropout of 0.5 is applied to the two fully connected layers in the 6th segment of the VGG16_SE network, while the rest of parameters are initialized using random values from a normal distribution.
  3. Data Visualization. The visualization of the first 36 channel feature maps before the convolutional layers of VGG16 network is shown in Fig 13. In Fig 13, the lower layers of the network, such as conv1_2, primarily extract color and edge features from the pseudo-color images of heterogeneities. The middle layers of the network, such as conv3_3, mainly capture simple texture features of the heterogeneities. Meanwhile, the higher layers of the network, like conv5_3, predominantly extract abstract features of the heterogeneities at a finer level.

Results and analysis

After FA and M_D processing, the quality of MTI has been significantly improved. Among them, the gray level of image is stretched, the SNR is gradually increased, and the image definition is obviously improved. The pseudo-color images with different wavelength fusion can effectively realize the classification of heterogeneities in both original and enhanced VGG16 networks.

(1) Both FA and M_D significantly improved the quality of images. To effectively demonstrate the changes in image quality before and after preprocessing, the SNR and PSNR are calculated before and after image processing, as shown in equations (5)-(7). The results are presented in Table 1. From Table 1, the following observations can be made: ①All wavelength images obtained from different preprocessing methods exhibit an increase in SNR compared to the original images. Higher SNR values indicate better image quality, which results in clearer visualization of the tissues. Among them, the 3.5 Hz M_D image in the blue wavelength showed the widest increase in SNR, with an increase of 36.47%. ②The PSNR values for all wavelength images under different preprocessing methods are positive. A positive PSNR value indicates a significant increase in the grayscale levels of images after preprocessing. This enhancement is beneficial for the classification of heterogeneities within the images.

In addition, we noticed that the inverse trends between SNR and PSNR in Table 1 reflect the distinct priorities of these metrics. SNR improvements (e.g., 36.47% increase for 3.5 Hz M_D in blue wavelength) confirm enhanced noise suppression, while PSNR reductions indicate deviations from the reference due to preprocessing-induced structural changes. For instance, FA reduces noise but may blur fine details, increasing MSE. Similarly, M_D-FA enhances weak signals but alters grayscale distributions, reducing fidelity to the original reference. Clinically, this trade-off is advantageous—higher SNR facilitates feature extraction for deep learning models, even at the cost of pixel-level accuracy.

(5)(6)(7)

Where represents the pixel values in image; represents the average value of image pixels; represents the size of image; represents the ground truth image (Here we use the original image as the reference image.) and represents the processed image. Higher SNR indicates better noise suppression. Higher PSNR implies smaller pixel-level deviations from the reference. And SNR focuses on noise reduction (signal preservation vs. noise), while PSNR penalizes structural deviations from the reference (e.g., blurring or contrast shifts).

(2) Both the original and enhanced VGG16 network models effectively achieve the classification of heterogeneities in multi-spectral images. To comprehensively evaluate the model’s performance, this paper takes accuracy, precision, recall and F-score as evaluation indicators. True Positive (TP) represents the Positive sample predicted by the model, True Negative (TN) represents the negative sample predicted to be negative, False Positive (FP) represents the negative sample predicted to be positive, and False Negative (FN) represents a positive sample that is predicted to be negative. Recall rate and precision rate are a pair of contradictory quantities, the recall rate is relatively low when the precision rate is high. When the recall rate is relatively high, the precision rate is relatively low, so when the recall rate and the precision rate are relatively high, it means that the classification effect of this network is better. F-score is used to measure both recall and precision attributes. The definition Equations for calculating the above evaluation indicators through the confusion matrix is as follows:

(8)(9)(10)(11)

The enhanced VGG16 model gradually enhanced the classification accuracy of heterogeneities in multi-spectral images. Pseudo-color images fused from different wavelengths are trained separately in VGG16, VGG16_BN, VGG16_BN_SE and VGG16_BN_SE_GAP networks to achieve the classification of heterogeneities in multi-spectral images. As can be seen from Table 2: ①Compared to the original VGG16 model, the enhanced VGG16 models progressively increased the classification accuracy of heterogeneities in transmission images. The VGG16_BN_SE_GAP model exhibit the widest improvement, reaching 8.90% ± 0.75%, followed by VGG16_BN_SE and VGG16_BN models. ②Across different preprocessing methods, all models effectively improved the classification accuracy of heterogeneities compared to original images. In the VGG16_BN_SE_GAP model, the highest classification accuracy of heterogeneities in 3.5 Hz M_D-FA images is achieved, reaching 97.57% ± 0.50%. ③Among different classification models, except for the original VGG16 model achieving the highest classification accuracy for 3.5 Hz M_D images, the highest classification accuracy for heterogeneities in 3.5 Hz M_D-FA images is achieved in other enhanced VGG16 models, reaching 92.18% ± 0.65%, 95.64% ± 0.60 and 97.57% ± 0.50%, respectively. ④With the normalization of convolutional layers and the addition of GAP in the enhanced VGG16 models, the classification accuracy of heterogeneities gradually increased across different image preprocessing methods (FA, M_D, M_D-FA). ⑤In the enhanced VGG16 models, the overall average classification accuracy of 3.5 Hz image preprocessing methods is higher than that of 4 Hz image preprocessing methods.

thumbnail
Table 2. Average classification results of six heterogeneities using original and enhanced VGG16 network models.

https://doi.org/10.1371/journal.pone.0329009.t002

And the classification accuracy of heterogeneities in Table 3 reflects the model’s ability to discriminate tumor-like anomalies from normal tissue analogs. For example, H6(potato blocks) achieved 99.10% ± 1.02% accuracy in the VGG16_BN_SE_GAP model, demonstrating exceptional sensitivity to high-density, high-absorption features akin to malignant tumors. This aligns with clinical observations where malignant lesions exhibit pronounced optical heterogeneity due to irregular angiogenesis and hemoglobin accumulation. Conversely, lower accuracy for H1 (pumpkin blocks, 96.55%) suggests challenges in distinguishing subtle benign lesions from normal tissue, mirroring diagnostic difficulties in early-stage screening. The 3.5 Hz M_D-FA preprocessing method’s superior performance (97.57% ± 0.5% overall accuracy) likely stems from enhanced SNR in capturing dynamic vascular patterns, a critical factor in tumor detection.

thumbnail
Table 3. Classification results of six heterogeneities using different network models.

https://doi.org/10.1371/journal.pone.0329009.t003

Conclusion

This study demonstrates significant translational potential for advancing early breast cancer detection through the integration of M_D-FA techniques and enhanced deep learning models. By achieving a classification accuracy of 97.57% ± 0.50% for heterogeneities in MTI, our framework provides a radiation-free, cost-effective alternative to conventional imaging modalities, directly addressing the clinical need for safer and more accessible screening tools. The experimental validation using phantoms—designed to replicate the optical properties of breast tissue—establishes a critical foundation for future clinical trials, as the controlled improvements in SNR and image clarity (e.g., 36.47% SNR increase) correlate strongly with enhanced diagnostic precision in real tissue. In preparation for these critical next steps, we are currently in the process of obtaining IRB approval and establishing collaborations with affiliated hospitals to acquire clinically annotated breast tissue images. These data will be used to validate the proposed framework in real-world scenarios, with a focus on correlating optical heterogeneities with histopathological findings. We anticipate that these efforts will facilitate the clinical translation of our method and enhance its diagnostic reliability. However, the synergistic effects among the various techniques (e.g., M_D-FA, U-Net segmentation, and VGG16 enhancements) are not yet fully explored, potentially limiting the fusion efficiency between algorithmic components. To bridge this gap and ensure robust applicability to biological tissues, future work will prioritize optimizing the integration of these techniques, validation of the model on human-derived datasets, and calibration of wavelength-specific parameters to account for tissue variability (e.g., hemoglobin absorption). Furthermore, future work must address two critical steps for clinical translation: (1) Correlation of optical heterogeneities with histopathological findings in biopsy-confirmed breast tissues; (2) Integration of clinical metadata to refine classification thresholds for benign vs. malignant discrimination. Additionally, optimizing frequency-specific modulation (e.g., 3.5 Hz vs. 4 Hz) to target tumor-specific hemodynamic patterns could further enhance diagnostic specificity.

Acknowledgments

We thank the State Key Laboratory of Precision Measuring Technology and Instruments for the use of their equipment.

References

  1. 1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin. 2021;71(3):209–49. pmid:33538338
  2. 2. Cao 2 W, Chen HD, Yu YW. Changing profiles of cancer burden worldwide and in China: A secondary analysis of the global cancer statistics 2020. Chinese Med J-Peking. 2021;134(7):783–91.
  3. 3. Fan L, Strasser-Weippl K, Li JJ. Breast Cancer in China. Lancet Oncol. 2014;15(7):e279–89.
  4. 4. Giaquinto AN, Sung H, Miller KD, Kramer JL, Newman LA, Minihan A, et al. “Breast cancer statistics, 2022,”. Ca-Cancer J Clin. 2022;72:524–541.
  5. 5. Hofvind S, Moshina N, Holen ÅS, Danielsen AS, Lee CI, Houssami N, et al. Interval and Subsequent Round Breast Cancer in a Randomized Controlled Trial Comparing Digital Breast Tomosynthesis and Digital Mammography Screening. Radiology. 2021;300(1):66–76. pmid:33973840
  6. 6. Coronado-Gutiérrez D, Santamaría G, Ganau S, Bargalló X, Orlando S, Oliva-Brañas ME, et al. Quantitative Ultrasound Image Analysis of Axillary Lymph Nodes to Diagnose Metastatic Involvement in Breast Cancer. Ultrasound Med Biol. 2019;45(11):2932–41. pmid:31444031
  7. 7. Taralli S, Lorusso M, Perrone E, Perotti G, Zagaria L, Calcagni ML. PET/CT with Fibroblast Activation Protein Inhibitors in Breast Cancer: Diagnostic and Theranostic Application-A Literature Review. Cancers (Basel). 2023;15(3):908. pmid:36765866
  8. 8. Alikhassi A, Li X, Au Fdr. False-positive incidental lesions detected on contrast-enhanced breast mri: clinical and imaging features. Breast Cancer Research and Treatment. 2023.
  9. 9. Ntziachristos V, Chance B. Probing physiology and molecular function using optical imaging: applications to breast cancer. Breast Cancer Res. 2001;3(1):41–6. pmid:11250744
  10. 10. Yang X, Li G, Lin L. “Assessment of spatial information for hyperspectral imaging of lesion,” Conference on Optics in Health Care and Biomedical Optics VII. 2016.
  11. 11. Jiang F, Liu P, Zhou X. Multilevel fusing paired visible light and near-infrared spectral images for face anti-spoofing. Pattern Recognition Letters. 2019;128:30–7.
  12. 12. Rui H, Yunhao Z, Shiming T, Yang Y, Wenhai Y. Fault point detection of IOT using multi-spectral image fusion based on deep learning. Journal of Visual Communication and Image Representation. 2019;64:102600.
  13. 13. Li G, Tang H, Kim D, Gao J, Lin L. Employment of frame accumulation and shaped function for upgrading low-light-level image detection sensitivity. Opt Lett. 2012;37(8):1361–3. pmid:22513686
  14. 14. Yang X, Hu Y, Li G, Lin L. Effect on measurement accuracy of transillumination using sawtooth-shaped-function optical signal. Rev Sci Instrum. 2016;87(11):115106. pmid:27910699
  15. 15. Hu YJ, Yang X, Wang MJ. Optimum method of image acquisition using sawtooth-shaped-function optical signal to improve grey-scale resolution. J Mod Optic. 2016;63:1539–43.
  16. 16. Zhang BJ, Zhang CC, Li G. Multispectral heterogeneity detection based on frame accumulation and deep learning. IEEE Access. 2019;1(1):1–1.
  17. 17. Zhang BJ, Zhang CC, Li G. A preprocessing algorithm based on heterogeneity detection for transmitted tissue image. Eurasip J Wirel Comm. 2019;209.
  18. 18. Liu FL, Li G, Yang SQ. Detection of heterogeneity in multi-spectral transmission image based on spatial pyramid matching model and deep learning. Opt Laser Eng. 2020;134.
  19. 19. Verburg E, van Gils CH, van der Velden BHM, Bakker MF, Pijnappel RM, Veldhuis WB, et al. Deep Learning for Automated Triaging of 4581 Breast MRI Examinations from the DENSE Trial. Radiology. 2022;302(1):29–36. pmid:34609196
  20. 20. Mohamad DNFP, Mashohor S, Mahmud R. Transition of traditional method to deep learning based computer-aided system for breast cancer using automated breast ultrasound system (abus) images: a review. Artif Intell Rev. 2023;56(12):15271–300.
  21. 21. Naeem OB, Saleem Y, Khan MUG. Breast mammograms diagnosis using deep learning: state of art tutorial review. Arch Comput Method E. 2024.
  22. 22. Ting FF, Tan YJ, Sim KS. Convolutional neural network improvement for breast cancer classification. Expert Systems with Applications. 2019;120:103–15.
  23. 23. Shen L, Margolies LR, Rothstein JH. Deep learning to improve breast cancer detection on screening mammography. Sci Rep-Uk. 2019;9.
  24. 24. Khan S, Islam N, Jan Z, Ud Din I, Rodrigues JJPC. A novel deep learning based framework for the detection and classification of breast cancer using transfer learning. Pattern Recognition Letters. 2019;125:1–6.
  25. 25. Montaha S, Azam S, Rafid AKMRH, Ghosh P, Hasan MZ, Jonkman M, et al. BreastNet18: A High Accuracy Fine-Tuned VGG16 Model Evaluated Using Ablation Study for Diagnosing Breast Cancer from Enhanced Mammography Images. Biology (Basel). 2021;10(12):1347. pmid:34943262
  26. 26. Ciobotaru A, Bota MA, Goța DI, Miclea LC. Multi-Instance Classification of Breast Tumor Ultrasound Images Using Convolutional Neural Networks and Transfer Learning. Bioengineering (Basel). 2023;10(12):1419. pmid:38136010
  27. 27. Kollem S, Sirigiri C, Peddakrishna S. A novel hybrid deep CNN model for breast cancer classification using Lipschitz-based image augmentation and recursive feature elimination. Biomedical Signal Processing and Control. 2024;95:106406.
  28. 28. Jahangeer GSB, Rajkumar TD. Early detection of breast cancer using hybrid of series network and VGG-16. Multimed Tools Appl. 2020;8(5):7853–86.
  29. 29. Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. Springer. 2015.
  30. 30. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. Computer Science. 2014.
  31. 31. Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. JMLR.org. 2015.
  32. 32. Hu J, Shen L, Albanie S. Squeeze-and-Excitation Networks. IEEE T Pattern Anal. 2020;42(8):2011–23.