Figures
Abstract
Breast cancer is a highly heterogeneous malignant tumor, and its accurate classification is of great significance for clinical diagnosis and treatment decision-making. In recent years, convolutional neural networks and Transformer have been widely used in pathological image analysis of breast cancer. Though the former excels at capturing local information and the latter is adept at modeling global dependencies, the former is limited by fixed sampling positions and is hindered to characterize irregular cell morphology, the latter is insufficient in describing the two-dimensional spatial structure of cells and tissues. Based on these, a Spatial–Frequency Domain Feature Extraction Model (S-FDFEM) proposed in this paper integrates spatial and frequency domain information to enhance feature learning for pathological image recognition. Specifically, in the spatial domain, Deformable Bottleneck Convolution (DBottConv) is utilized to effectively represent intricate cell and tissue morphological variations in pathological images and improve the expression ability of local features; in the frequency domain, wavelet low frequencies and Fourier high frequencies are generated based on the input pathological images to capture global approximation and fine structures, then statistical transformer and depth gradient feature extraction modules are integrated to operate on these two frequency domain components, enabling global dynamic focusing and two dimensions spatial characterization of pathological images. Experimental results show that the classification of breast cancer pathological image on BreakHis and BACH datasets verify the superiority of the S-FDFEM proposed in this paper.
Citation: He L, Hu H, Cheng R (2026) Breast cancer histopathological image classification based on collaborative multi-domain feature learning. PLoS One 21(1): e0341320. https://doi.org/10.1371/journal.pone.0341320
Editor: Xiaohui Zhang, Bayer Crop Science United States: Bayer CropScience LP, UNITED STATES OF AMERICA
Received: September 8, 2025; Accepted: January 6, 2026; Published: January 27, 2026
Copyright: © 2026 He et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The dataset in this article is a publicly available dataset.The dataset can be obtained from the following websites: Breakhis: https://web.inf.ufpr.br/vri/databases/breast-cancer-histopathological-database-breakhis/. BACH: https://zenodo.org/record/3632035, https://doi.org/10.1016/j.media.2019.05.010.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
1 Introduction
Breast cancer remains the most prevalent malignancy among women globally, representing a major contributor to cancer-related morbidity and mortality [1–4]. The increasing incidence of this disease, driven by complex interactions among genetic, hormonal, environmental, and lifestyle factors, underscores the urgent need for improved diagnostic and therapeutic strategies. Early and accurate detection is crucial for enhancing patient outcomes, reducing mortality rates, and enabling personalized treatment approaches. While traditional diagnostic methods such as mammography, ultrasonography, and tissue biopsy have advanced clinical practice, they still present significant limitations. Mammography, the primary screening tool, has reduced sensitivity in dense breast tissue, particularly in younger women, which can result in false-negative findings [5]. Although ultrasonography is useful for differentiating benign from malignant lesions, it lacks the spatial resolution necessary to detect small or early-stage tumors [6]. Tissue biopsy, although definitive, is invasive, costly, and impractical for routine screening [7], highlighting the need for robust, non-invasive, and accurate diagnostic methods.
Histopathological analysis, enabled by Whole Slide Imaging (WSI), has become the gold standard for breast cancer diagnosis, as it provides detailed insights into cellular morphology, tissue architecture, and molecular marker profiles [8]. However, manual histopathological evaluation is time-consuming, prone to inter-observer variability, and limited by human cognitive capacity [9].
In recent years, advancements in deep learning technologies, specifically convolutional neural networks (CNNs) and transformers, have greatly improved the extraction of features from pathological images. These methods, known for their end-to-end learning capability, robust nonlinear modeling performance, and effective representation of intricate semantic relationships within images, have demonstrated outstanding performance in analyzing high-dimensional pathological image data. As a result, they have effectively addressed critical challenges in pathological image classification [10]. CNNs have demonstrated considerable success in extracting local features from pathological images. For instance, Xu et al. [11] developed a multidimensional feature extraction network MDFF-NET which integrated both one-dimensional and two-dimensional convolutional operations to effectively capture intricate pathological characteristics and encode high-level semantic information from histopathological image data. George et al. [12] detected non-overlapping cell nucleus blocks from histopathology images and proposed a low-complexity CNN for feature extraction. The extracted CNN features were then classified using a strategy that combined feature fusion with a support vector machine (FF+SVM). LI.Xia et al. [13] proposed a novel breast cancer histopathological image classification network (BC-MBINet) based on multiple convolutional layers, achieving an accuracy of 99.04%. Joseph et al. [14] proposed a fully automated and robust dual multi-scale convolutional neural network capable of extracting linearly separable and scale-invariant features to address the challenges posed by variations in resolution, texture diversity, and high coherence in breast cancer histopathological images. Traditional convolution operations are constrained by a fixed receptive field, limiting their ability to capture the global structural information of lesion regions and thereby hindering further improvements in classification performance. Transformers address this limitation by effectively modeling global context and long-range dependencies. Li et al. [15] incorporated BIFormer into the feature extraction stage to strengthen global feature interactions and improve semantic information transfer in histopathological images. Similarly, Hao et al. [16] proposed ST Double Net, a two-stage Swin Transformer–based architecture that integrates global and local features to enhance feature diversity. Gao et al. [17] further introduced HTransMIL, a Transformer-based hybrid multiple instance learning framework that captured comprehensive contextual information and improves inter-class discriminability.
Therefore, CNNs and Transformer extract features from local and global areas, respectively, and finally the high-resolution image is transformed into a set of high-dimensional features containing rich information. Although they have achieved good results in the classification of breast cancer pathological images, they still have some limitations, limiting the full representation of pathological features. On the one hand, CNNs rely on fixed size convolutional kernels to extract features within the local receptive field. This static sampling method is difficult to adaptively model irregular tissues and cell morphology in complex pathological images, resulting in shortcomings in characterizing the microstructure and overall morphology of the model. On the other hand, through its self-attention mechanism, Transformer can automatically model the long-range dependencies between any pixel region in the image, effectively promoting the interaction of cross regional tissue structure and cell morphology information in pathological images. However, the self-attention mechanism has limitations in processing pathological images: it lacks explicit modeling ability for two-dimensional spatial structures, making it difficult to accurately perceive the arrangement and spatial distribution patterns of cell populations. And these morphological spatial features are precisely important pathological basis for distinguishing cancer subtypes and evaluating malignancy [18].
To this end, we propose a spatial-frequency domain feature extraction model (S-FDFEM) that fully considers irregular cell and tissue morphology in the spatial domain, and introduces frequency domain characteristics to promote the two-dimensional spatial information expression of pathological images. The main contributions can be summarized as follows:
1. Design a novel multi-domain collaborative feature learning architecture, Spatial–Frequency Domain Feature Extraction Model (S-FDFEM), which can combine local, global and spatial representations to enhances the ability to capture complex, multi-morphological structures in pathological images.
2. We propose a frequency domain global-spatial feature extraction module (FD-GSFE), which is used to enhance the perception of global and two-dimensional spatial features from breast cancer pathological images, so that the model can learn features that are critical to cancer discrimination.
3. We design a spatial domain feature extraction (SDFE) module that adaptively captures local features of irregular tissue morphology in cancer regions, effectively overcoming the limitations of fixed receptive fields in traditional convolutional kernels and further improving feature representation and classification performance.
2 Related work
Convolutional neural networks (CNNs): CNNs have become a cornerstone in image analysis.Classical architectures such as ResNet, GhostNet, MobileNet, and ShuffleNet have demonstrated strong performance by effectively capturing local morphological features, including nuclear shape and tissue structure, thereby achieving high classification accuracy [19–22]. To further enhance feature representation, Zou et al. [23] proposed DsHoNet, which combines covariance pooling with Ghost modules to construct a lightweight network enriched with high-order features. Liu et al. [24] developed CTransNet, leveraging pretrained DenseNet and transferred learning to achieve strong cross-dataset generalization. Hou et al. [25] introduced a method integrating convolutional Long Short Term Memory(LSTM) with an adaptive weighted bilateral multidimensional attention mechanism, improving feature quality while mitigating class imbalance. However, these approaches rely on convolution kernels with fixed receptive fields. When applied to structurally complex and morphologically diverse breast cancer histopathological images, their local receptive fields are limited, restricting the ability to adapt to irregular tissue distributions and cellular arrangements. To address this limitation, we propose a Deformable Bottleneck Convolution (DBottConv) structure that dynamically adjusts convolution sampling locations through learnable offsets, enabling flexible extraction of discriminative pathological features.
Histogram of Oriented Gradients (HOG): HOG plays a crucial role in image feature representation, effectively capturing the gradient magnitude and direction of each pixel [26]. This method has been widely applied. For example, R. Newlin Shebiah et al. [27] combined HOG with cascaded AdaBoost classifiers for human body part detection, achieving promising results, while Yang et al. [28] improved HOG and applied it to spectral image recognition, enabling polarization spectral analysis. However, when processing structurally complex breast cancer histopathological images, traditional HOG relies solely on horizontal and vertical gradient calculations, limiting its ability to capture irregular pathological features. To address this limitation, we propose a depth gradient feature extraction module that extends gradient computation to diagonal directions to enhance multi-directional structural perception. Finally, we integrated it into the Fourier high-frequency spatial feature extraction branch to enhance two-dimensional spatial perception.
Frequency domain analysis: The method provides a new methodological perspective for image processing, significantly enhances the expression ability of image features, and shows unique advantages in pathological image processing of breast cancer. In pathological images, high-frequency information usually corresponds to detailed features such as cell edges, texture structures, and dense area contours, which can enhance the recognizability of small shapes; Low frequency information reflects the global structure of the image, revealing the overall distribution characteristics of tissues and cells. For example, Li et al. [29] proposed a dual branch adaptive fusion network, which uses Fourier transform to enhance high-frequency and low-frequency components at the local and global feature levels respectively. By highlighting the complementarity between details and the overall structure, it improves the richness and discrimination of feature representation of breast cancer pathological images. Yan et al. [30] proposed DWNAT Net, which extracts deep frequency domain features based on discrete wavelet transform, further enhancing the frequency domain representation ability of pathological images and effectively improving classification performance. However, wavelet high frequency components contain significant noise, which distorts feature distribution and obscures key pathological details. The research by Tan J [31] shows that Fourier high frequency images have clear texture expression, highlighting edge and detail features (as shown in Fig 1). Therefore, we propose the Frequency Domain Global-Spatial Feature Extraction (FD-GSFE) module, which combines wavelet low-frequency and Fourier high-frequency representations to simultaneously capture global structural approximations and fine pathological details, thereby improving the robustness and accuracy of feature representation.
Spatial distribution characteristics: Spatial distribution characteristics are crucial for cancer type discrimination in pathological images. At present, Graph Neural Network (GNN) is the most commonly used spatial feature extraction method, which aggregates and updates information between nodes by constructing the topological structure between image nodes, and has achieved good results in pathological image analysis. For example, Liu et al. [32] proposed a hierarchical pyramid model that integrates GNN into a pyramid structure to learn the geometric spatial representation of pathological images. However, when tissues and cells are densely arranged in pathological images, traditional graph neural networks are difficult to distinguish between cell and tissue boundaries, which can easily lead to misjudgment of the lesion area. To address this limitation, we introduce a Fourier high-frequency spatial feature extraction branch in the FD-GSFE module, which derives depth gradient features from high-frequency components to enhance the model’s two-dimensional spatial perception.
In summary, S-FDFEM addresses the limitations of prior models by introducing a space-frequency collaborative feature extraction framework that jointly leverages local, global, and spatial distribution information. Different from hybrid models such as DWNAT-Net [30], which primarily enhance feature representations within a single domain, S-FDFEM explicitly integrates complementary information across the spatial and frequency domains, fully leveraging the unique advantages of each domain to capture fine-grained features structural patterns. At the same time, the S-FDFEM combines frequency domain with two-dimensional spatial distribution for the first time, making up for the lack of spatial distribution characteristics of pathological images in existing methods.
3 Key theories and technologies
This section elaborates the theoretical methods used in this paper: wavelet transform, Fourier transform and statistical attention mechanism [33], in which the attention mechanism will be combined with the transformer architecture in the Sect 4.2.2 to form the Statistical Transformer.
3.1 Wavelet transform
High-frequency and low-frequency components are extracted through two-dimensional discrete wavelet transform in this study. Let denote the input feature map. Using Haar wavelet transform to divide an image into four frequency domain sub bands, where the filtering used is shown in Eq (1):
where, represents the low-frequency and
represents the high-frequency components. Finally, the original image is reconstructed through the inverse wavelet transform (IWT) as follows:
where, Ioutput effectively synthesizing the decomposed frequency components back into the spatial domain.
3.2 Fourier transform
Assuming the input image is . The input image undergoes frequency transformation via the fast Fourier transform (FFT), transitioning from spatial to frequency domain representation. The mathematical formulation of this conversion is expressed as:
To achieve frequency-selective decomposition, frequency separation is performed through Gaussian filtering in the Fourier domain. This process involves:
where, , D(m,n) and D0 are cut-off frequencies. Multiply the frequency domain with the filter to obtain the filtered high-frequency component Ghigh(m,n) and low-frequency component Glow(m,n).
Finally, due to the fact that high-frequency features are only utilized, the inverse Fourier transform is applied to the high-frequency component, reconstructing a spatial-domain image:
3.3 Statistical attention
Let the input image be denoted as , where
is the spatial dimension and C is the number of channels. The specific steps of statistical attention [33] are as follows:
Firstly, a linear transformation followed by a rearrangement operation is applied to I to generate multi head representation:
where, W is the projection matrix and T is the rearrangement operation. ,
, d = C/h, d represents the dimension of each head and h represents the number of attention heads. Subsequently, normalize X along the N dimension according to the
norm to obtain O and the attention distribution A:
where, and
are the mean and variance of the N dimension, γ and β are learnable parameters, and
is a small constant to prevent the denominator from being zero. τ is a learnable temperature parameter used to scale the attention distribution.
Then, the attention scores are computed:
where, dots denotes the attention score, Anorm represents the normalized value of A, unsqueeze represents adding a new dimension to the third dimension.
Next, the weighted image is obtained by multiplying the attention weight with the attention weight attn and matrix O:
Finally, The weighted image outputweigh is processed through a LN layer and a MLP layer to get the output features, and the output features is rearranged to size, and the final restored image size is
.
4 Proposed model
4.1 Overall architecture
We propose a spatial-frequency domain feature extraction model (S-FDFEM). As illustrated in Fig 2, S-FDFEM is designed based on a pyramid structure, which can extract multi-scale fine-grained features. It consists of a stem and four stages. The stem performs preliminary processing on pathological images, and the subsequent four stages are used to extract shallow and deep pathological features. Due to the rich and relatively complete pathological information contained in shallow features, we only apply Frequency Domain Global-Spatial Feature Extraction (FD-GSFE) to the first two stages of the pyramid structure, and the output of the SDFE module is directly used as the input of the FD-GSFE module. Finally, the classifier outputs its category. Specifically, we use Spatial Domain Feature Extraction (SDFE) as the backbone to capture local features. FD-GSFE captures global and spatial perception.
(a) Structure diagram of the S-FDFEM model. (b) Composition structure of the SDFE module. (c) Composition structure of the FD-GSFE module.
Next,we explore these main modules separately.
4.2 Composition of S-FDFEM model
4.2.1 SDFE module.
Deformable Convolution parameterizes sampling positions through learned offsets, enabling flexible receptive field adaptation [34] and better alignment with irregular morphological features in tumor regions; Bottleneck convolution (BC) compresses feature representations into a compact low-dimensional form, thereby reducing computational complexity and enhancing efficiency while preserving representational capacity [35]. Therefore, we integrate deformable and bottleneck convolutions to devise the Deformable Bottleneck Convolution (DBottConv), which effectively captures local features from irregular cellular and tissue structures. Specifically, a DBottConv, Group Normalization (GN), and Relu form a Feature Enhancement Unit (FEU). Based on this, we designed a SDFE module, as shown in Fig 2(b), which is densely connected by multiple feature enhancement units. Fig 3 gives two important compositions of SDFE module.
In the SDFE module, each Feature Enhancement Unit (FEU) first derives local features with adaptive receptive fields using Deformable Bottleneck Convolutions (DBottconv), which begins with a pointwise convolution for channel compression to reduce computational overhead, followed by a deformable convolution to capture irregular morphological patterns in lesion regions, and concludes with another pointwise convolution to restore the original channel dimensions. Subsequently, group normalization and an activation function are applied to enhance training stability and convergence. Finally, the outputs from N FEUs are aggregated via dense connections to form the module’s final output of SDFE module.
4.2.2 FD-GSFE module.
Global feature extraction via Transformer-based methods plays a pivotal role in breast cancer histopathological image classification. However, Transformers often lack inherent two-dimensional spatial awareness. To address this, we propose the FD-GSFE module, which employs a dual-branch architecture (Fig 2(c)). This framework comprises two complementary components: the Wavelet Low-Frequency (WLF) branch for global feature extraction and the Fourier High-Frequency (FHF) branch for spatial feature extraction. Together, these branches enable cooperative global dynamic focusing and enhanced 2D spatial perception, fostering robust pathological image analysis.
Suppose the output features generated by the SDFE module to be . To save computing resources, the input features are segmented on the channel and used as inputs for two branches
and
. XWLF and XFHF are fed into the WLF branch and FHF branch through a
convolutional layer. Finally, the outputs of the two branches are concatenated at the channel level and represented as
.
A. WLF Branch:
As show in Fig 4, For the input data XWLF, we utilize wavelet transform to perform frequency domain decomposition:
where, DWT represents wavelet transform, represents low-frequency component, and
represents high-frequency component. Subsequently, global features are extracted using low-frequency components, while high-frequency components enhance lesion features:
where, STTR represents the Statistical Transformer and is the pathological feature after global dynamic focusing, which realizes the interaction and information exchange between pixels. Conv is composed of
convolution, Relu activation and normalization to enhance lesion features. Finally, the low-frequency and high-frequency images are concatenated on the channel, and the images are restored to the spatial domain using inverse wavelet transform:
where, IDWT is the inverse wavelet transform, and represents the concatenation of high-frequency and low-frequency features on the channel.
Statistical Transformer: The traditional Transformer architecture consists of Layer Normalization (LN), Multi-Head Self-Attention (MSA), a Multi-Layer Perceptron (MLP), and Residual Connections (RES). However, MSA models only first-order feature similarities through pixel-wise dot products, limiting its ability to capture higher-order statistical dependencies. To address this, we introduce a Statistical Attention Mechanism (SAM) that constructs data-adaptive low-rank projections based on the empirical second-order moments of pixel features, enhancing high-order feature modeling [33]. Replacing MSA with SAM yields the Statistical Transformer, which effectively captures second-order statistics and deepens feature representations. Fig 5 illustrates the detailed process, while its mathematical formulation is expressed as follows.
The input features undergo initial Layer Normalization (LN), followed by processing through the Statistical Attention Mechanism (SAM) to capture global contextual interactions and produce an enhanced representation. This enhanced representation is then fused with the original input F via a pixel-wise residual connection, yielding an intermediate output.
Subsequently, the intermediate output F1 is subjected to another LN layer and a Multi-Layer Perceptron (MLP), with the result added to the intermediate F1 via a residual connection to generate the final output features of the Statistical Transformer.
where, F2 represents the output features of the Statistical Transformer.
B.FHF-Branch:
In breast cancer histopathological images, Fourier high-frequency enhancement is employed to accentuate fine structural details. Subsequently, a Depth Gradient Feature Extraction Module (DGFEM) is applied to these enhanced contours to capture intricate two-dimensional spatial features. Fig 6 illustrates the architecture of the FHF-Branch, where DGFEM forms the core component.
Assume the input image is denoted as . First, the high-frequency image is derived via Fourier transform:
where, FFT represent the Fourier transforms, , A 1×1 convolution then compresses the channels of
to 1, yielding the feature map
, which reduces computational overhead.
The Gradient Value Feature (GVF) and Gradient Direction Feature (GDF) are computed as:
where, DGFEM represents depth gradient feature extraction modules, ,
. The GVF emphasizes pathological cell and tissue edges, while the GDF captures the geometric spatial arrangement of boundaries across varying receptive fields, consisting of XS and XB:
where, Split represents dividing the channel, . Then, we split the GVF on the channel, adding them to the XS and XB at the pixel level respectively, representing the addition of their two-dimensional spatial information to each pixel point:
where, XS1 and XB2 represent learning two-dimensional spatial features of different scales, and finally concatenating these two features at the channel level:
DGFEM Module: Traditional Histogram of Oriented Gradients (HOG) [26] highlights edge magnitudes via gradient values and reflects angular variations in gradient directions. However, it often overlooks diagonal responses (e.g., at and
), leading to inadequate representation of subtle morphological changes in complex pathological images.To mitigate this, the DGFEM module extends HOG by incorporating diagonal gradient values (
,
) and refining direction calculations. Gradient values are enhanced via softmax to emphasize edges, while directions capture geometric arrangements of cell and tissue boundaries. Dual convolutions with varying kernel sizes expand the receptive fields for spatial learning.
Assume the feature map F is . Four directional operators compute gradients:
The gradient value for each direction is ,
represents the gradient value calculation operation for each direction. Subsequently, the gradient values of each pixel can be obtained:
Edge enhancement yields the Gradient Value Feature (GVF) via softmax:
where, , project the gradient value in the diagonal direction onto the horizontal and vertical directions:
Finally, the gradient direction (GRD) θ is as follows:
The GRD forms the gradient direction matrix of the feature map. To derive deep spatial feature distributions, a small-kernel convolution (SConv, ) and a large-kernel convolution (BConv,
) are applied to this matrix, promoting multi-scale spatial learning. The resulting distributions are concatenated channel-wise and normalized to yield the Gradient Direction Feature (GDF) it’s scale is
.
5 Experiments
5.1 Dataset
To ensure a comprehensive evaluation, we conduct experiments using both the BreakHis and BACH datasets. The BreakHis dataset contains 7,909 breast tumor histopathology images obtained under a microscope at four magnification levels (,
,
, and
). It includes both benign and malignant cases, which are further classified into eight subtypes based on pathological characteristics (Table 1). Depending on the task configuration, this dataset can be used for binary (benign vs. malignant) classification or eight-class classification. The BACH (Breast Cancer Histology) dataset contains 400 pathological images, categorized into four classes with 100 images each (Table 1). All images in both datasets were manually annotated by experienced pathologists to ensure high annotation reliability. It is worth noting that the BreakHis dataset exhibits significant class imbalance in the eight-class classification setting, which may influence model performance. However, this work focuses primarily on the model design for breast histopathology image classification and does not address data imbalance.
5.2 Experimental setup
The S-FDFEM performs the binary classification and eight-class classification on the Breakhis dataset, and the four-class classification on the BACH dataset. Data augmentation techniques are applied to expand the BreakHis dataset, the specific augmentation methods are: rotation, translation, contrast enhancement, saturation enhancement; Referring to the processing method [36], we exclude plaques with a core count of less than 30 in BACH and regenerate data based on samples from each category. The parameters of S-FDFEM are shown in Table 2. The evaluation indicators used in this experiment are accuracy, precision, F1-score and recall. To ensure a fair comparison, all models in the experiments were implemented using the same parameter configurations as the proposed S-FDFEM.
5.3 Experimental results of S-FDFEM model
5.3.1 Breakhis dataset.
Tables 3 and 4 are the results of the binary and 8-class classification on the BreakHis dataset by the S-FDFEM model, respectively. From Table 3, the classification accuracies reach 99.33%, 99.36%, 98.67%, and 98.53% at ,
,
, and
magnifications, respectively, and the recalls, precisions and F1-Scores areconsistently above 98%. From Table 4, the classification accuracies at
,
,
, and
magnification reach 94.11%, 92.91%, 91.17% and 89.87%, respectively, both recall and precision remain above 87%. Especially, the reduction of the amount of organization information under fixed clipping size and the serious imbalance of data (for example, the sample sizes in the
test set vary from 33 to 268) cause the classification performance of S-FDFEM to be declined.
5.3.2 BACH dataset.
The evaluation results on the BACH dataset further demonstrate the strong classification capability of the S-FDFEM model. From Table 5, the model achieves an accuracy of 99.35%, precision of 99.37%, recall of 99.24%, and an F1-score of 99.30%. These results confirm the model’s strong generalization ability on breast cancer histopathology images.
5.4 Comparison between SDFE module and other modules
To verify the effectiveness of the SDFE module, we remove the FD-GFSE module from the first and second stages and constructed a new model consisting of only four SDFE modules, written as SDFE module, which is compared and analyzed with ResNet34 [37], Inception [38], and MobileNet [39].
5.4.1 Breakhis dataset.
Experimental results on Breakhis dataset are presented in Table 6. At and
magnifications, the proposed SDFE model achieves noticeably higher accuracy, precision, recall, and F1-score compared to all the compared modules. Although the Inception module shows a slight advantage in accuracy (<0.5%) at
and
magnifications, the SDFE module still maintains optimal classification effect from the perspective of comprehensive performance indicators. This advantage is attributed to the ability of deformable bottleneck convolution module to accurately capture irregular pathological features.
5.4.2 BACH dataset.
The evaluation results on the BACH dataset are summarized in Table 7. ResNet34 achieves the highest accuracy (98.97%), while the SDFE model demonstrates more stable performance across accuracy, precision, recall, and F1-score. This result shows that Resnet34 is the best in classification effect and can accurately distinguish different categories, while the SDFE outperformed it in metric stability. Compared with MobileNet and Inception, the SDFE model achieves consistently higher performance.
5.5 Ablation experiments
To validate the S-FDFEM model, we conducted ablation experiments on both BreakHis and BACH datasets. Table 8 shows the module configuration of the four models in the ablation experiments. The SDFE module is shared across all four models. Baseline, Model A, Model B, and Model C are obtained by selectively enabling the WLF and FHF branches, where Model C corresponds to the proposed S-FDFEM.
5.5.1 BreakHis dataset.
Tables 9 and 10 present the binary and 8-class classification results on the BreakHis dataset for the Baseline, Model A, Model B, and Model C. Fig 7 shows the comparison curves of the four models on the binary classification task of the BreakHis dataset.
When the magnification is ,
,
, and
, the binary classification accuracy of Baseline model on the Breakhis Dataset is 97.82%, 97.44%, 97.07% and 96.51%, respectively. The accuracy of eight-class classification is 91.41%, 90.66%, 89.33%, 86.93%, respectively. These results indicate that the use of deformable convolution causes the extraction of local irregular features of pathological images, improving local feature representation.
The WLF-branch leverages a statistical transformer to extract low-frequency global information in the wavelet frequency domain. With this branch added, Model A outperforms the Baseline by 0.48–1.29% in binary classification and 0.64–1.47% in eight-class classification, indicating enhanced global discrimination capability.
The FHF-branch utilizes the Fourier high-frequency component together with the DGFEM module to capture the two-dimensional spatial distribution of pathological structures. With this branch added, Model B improves binary classification accuracy by 0.16–1.10% and eight-class accuracy by 0.50–1.29%, demonstrating strengthened spatial feature representation.
Model C, which incorporates both the WLF-branch and FHF-branch, corresponds to the proposed S-FDFEM. It achieves the largest performance gain, improving binary classification accuracy by 1.51–2.02% and eight-class accuracy by 1.84–2.94%, confirming the complementary effectiveness of global and spatial-frequency feature fusion.
5.5.2 BACH dataset.
Table 11 presents the four-class classification results of the Baseline, ModelA, ModelB, and ModelC models on the BACH dataset. From Table 11, baseline model attains 97.26% accuracy; ModelA effectively introduces the wavelet low-frequency image and statistical transformer module, improving the classification accuracy to 98.93% (relative improvement+1.67%); ModelB introduces the Fourier high-frequency components together with the DGFEM module, enhancing the spatial characterization and yielding an accuracy of 98.21% (a 0.95% improvement). ModelC combines the advantages of ModelA and ModelB to further improve the classification ability of ModelC, and the classification result reaches 99.35%.
The experimental results fully verify the effectiveness and synergy of each module. Especially, while maintaining high accuracy, ModelC has achieved more than 99% of the evaluation indicators, showing the stability and reliability of classification.
5.6 Comparison with classic models
5.6.1 BreakHis dataset.
The existing classic networks AleNex [40], ConvNext-V2 [41], EfficientNet-V2 [42], MobileNet-V3 [43], ShuffleNet-V2 [44], Vision-Transformer [45], ResNet34 [37], VGG19 [46], and Swin-Transformer [47] are chosen to comprehensively verify the performance of the proposed S-FDFEM. Tables 12 and 13 show the compared results of binary classification and 8-class classification on BreakHis Dataset. Fig 8 shows the visualization of indicators of binary classification. Table 12 presents the classification accuracies of the MobileNet-V3 model on the BreakHis dataset, achieving 97.32%, 96.63%, 96.35%, and 95.05% at ,
,
and
magnifications, respectively. The S-FDFEM model performs well and can accurately capture the features of complex pathological images. Its accuracy at four multiples is 2.01%, 2.73%, 2.32%, and 3.48% higher than that of the Mobilenet-V3 network. From Table 13, though the indicators of all models are low, the S-FDFEM also shows the highest indicators. Compared with mobilenet-v3, the S-FDFEM improves accuracy by 0.37-0.96%, precision by 0.39-2.08%, Recall by 0.58-2%, and the F1-score by 1.19-1.79%.
Fig 9 shows the ROC curves of Classic models and S-FDFEM on the BreakHis Dataset. The proposed S-FDFEM achieves AUC Value of 99.95 (), 99.68 (
), 99.61 (
), and 99.58 (
), consistently exceeding Classic models across all magnification levels, which demonstrates S-FDFEM’s stable discriminative ability in distinguishing histopathological samples.
5.6.2 Comparison on bach dataset.
Table 14 shows the comparative results between S-FDFEM and classical networks shown in the Sect 5.6.1 on BACH Dataset, and Fig 10 illustrates their corresponding ROC curves. From Table 14, S-FDFEM achieves the best overall performance, with ResNet34 ranking second.Though the accuracy difference is small, S-FDFEM shows more comprehensive performance advantages: accuracy (99.35%), precision (99.37%), recall (99.24%) and F1 score (99.30%). These four indicators have exceeded 99% and tend to be stable. From Fig 10, the AUC value of S-FDFEM reaches 99.99, which is significantly better than other models. These results verify the effectiveness of the space-frequency domain collaborative design in enhancing classification robustness and reducing the risk of misdiagnosis.
5.7 Comparison with existing models
We choose the existing models DsHoNet [23], MAW-BMRSFAN [25], DenseNet201 + XGBoost [48], Multi-Level Feature Fusion [49], PCSAM- ResCBAM [50], MSFEL-DAAMS [51], GLNET [52] and FCCS-Net [53] on binary classification to be compared with the S-FDFEM. Due to the serious data imbalance problem in the images of eight categories in the BreakHis dataset, which seriously affects the classification results, this paper did not correct the data imbalance. To ensure the effectiveness and fairness of the comparison, we selected four models with the same untreated data imbalance problem for comparison: ResNet-FRLM [54], IDSNe [55], SE-ResNet [56] and AFFNet [57].
From Table 15, the proposed S-FDFEM exhibits significant classification advantages at ,
, and
magnifications, whose accuracies increase 1.55%, 0.96%, and 0.79% higher than those of the suboptimal model DsHoNet. Only at
magnification, the optimal model DsHoNet has a slight advantage of 0.34% on accuracy over the suboptimal model S-FDFEM. Therefore, S-FDFEM model outperforms other models and has good performances.
From Table 16, S-FDFEM exhibits the highest classification performance at 40×magnification, surpassing SE-ResNet by 0.37%. The best SE- ResNet performs slightly better than S-FDFEM at magnifications of ,
and
, with the difference remaining within 1%.These results indicate that the S-FDFEM method maintains strong competitiveness and has strong overall robustness.
In summary, the proposed S-FDFEM demonstrates consistent and robust performance in both binary and multi-class breast cancer classification tasks, maintaining a competitive advantage over most existing approaches. This confirms its effectiveness and adaptability in analyzing breast cancer histopathological images.
To sum up, the S-FDFEM shows the robust performance on both binary classification and 8-class classification and still maintains its competitive advantage over most existing methods, highlighting its effectiveness and adaptability in processing breast cancer pathological images.
5.8 The impact of different loss functions
We conducted comparative experiments on the BreakHis dataset for the eight-class classification task using both Cross-entropy loss and Focal loss. As shown in Table 17, the conventional cross-entropy loss leads to a noticeable decline in classification performance under imbalanced data, as it tends to bias the model toward majority classes. In contrast, the focal loss alleviates this issue by adaptively down-weighting easily classified samples, thereby guiding the model to focus more on minority and hard-to-classify categories. Consequently, the focal loss enables the model to learn more discriminative feature representations and improves overall classification robustness.
6 Visualization
We employed Grad-CAM visualization to analyze the BreakHis datasets, shown in Fig 11. Fig 11 illustrates the model’s attentional focus on pathological images from the BreakHis dataset. The results indicate that the S-FDFEM model can accurately locate pathological areas with irregular shapes, which is due to the model’s recognition of the spatial distribution of cells and tissues and adaptive learning of irregular contours. This performance highlights the robustness of the model in capturing fine-grained spatial features, which is crucial for improving the diagnostic accuracy of histopathological analysis of tissues.
7 Conclusion
The proposed S-FDFEM integrates spatial and frequency domain information to improve the automated classification of breast cancer histopathological images. In the spatial domain, deformable bottleneck convolution extracts irregular local features from pathological images. In the frequency domain, wavelet and Fourier transforms capture global and spatial characteristics: Wavelet low-frequency components combined with a Statistical Transformer, establishing long-range dependencies and extracting key pathological features. Fourier high-frequency components paired with a deep gradient feature extraction module encode spatial relationships, enhancing discriminative feature representation. By fusing these complementary features, S-FDFEM generates richer and more distinctive deep representations. Evaluated on the BreakHis Dataset at ,
,
and
magnifications, S-FDFEM achieves the accuracies 99.33%, 99.36%, 98.64%, and 98.53% on binary classification, and the accuracies of 94.11%, 92.91%, 91.17%, and 89.87% on 8-class classification, respectively. Additionally, it attains 99.35% accuracy on the BACH Dataset. These results demonstrate the robustness of S-FDFEM, highlighting the potential of hybrid spatial-frequency approaches to enhance breast cancer pathological image classification. In the future, we will conduct research on the unresolved issue of data imbalance in this article, reduce the impact of data distribution, fully utilize the potential of the model, and further improve classification performance.
Acknowledgments
Thanks for your commons and suggestions. And our research has no financial supports.
References
- 1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global Cancer Statistics 2020 : GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71(3):209–49. pmid:33538338
- 2. Vuong D, Simpson PT, Green B, Cummings MC, Lakhani SR. Molecular classification of breast cancer. Virchows Arch. 2014;465(1):1–14. pmid:24878755
- 3. Armenta-Guirado BI, González-Rocha A, Mérida-Ortega Á, López-Carrillo L, Denova-Gutiérrez E. Lifestyle quality indices and female breast cancer risk: a systematic review and meta-analysis. Adv Nutr. 2023;14(4):685–709. pmid:37085092
- 4. Yang J, Ju J, Guo L, Ji B, Shi S, Yang Z, et al. Prediction of HER2-positive breast cancer recurrence and metastasis risk from histopathological images and clinical information via multimodal deep learning. Comput Struct Biotechnol J. 2021;20:333–42. pmid:35035786
- 5. Gruber IV, Rueckert M, Kagan KO, Staebler A, Siegmann KC, Hartkopf A, et al. Measurement of tumour size with mammography, sonography and magnetic resonance imaging as compared to histological tumour size in primary breast cancer. BMC Cancer. 2013;13:328. pmid:23826951
- 6. Hammad M, Bakrey M, Bakhiet A, Tadeusiewicz R, El-Latif AAA, Pławiak P. A novel end-to-end deep learning approach for cancer detection based on microscopic medical images. Biocybernetics and Biomedical Engineering. 2022;42(3):737–48.
- 7. Nassar A. Core needle biopsy versus fine needle aspiration biopsy in breast–a historical perspective and opportunities in the modern era. Diagn Cytopathol. 2011;39(5):380–8. pmid:20949457
- 8. Huang X, Chen J, Chen M, Wan Y, Chen L. FRE-Net: full-region enhanced network for nuclei segmentation in histopathology images. Biocybernetics and Biomedical Engineering. 2023;43(1):386–401.
- 9. Hanna MG, Parwani A, Sirintrapun SJ. Whole slide imaging: technology and applications. Adv Anat Pathol. 2020;27(4):251–9. pmid:32452840
- 10. Wen X, Guo X, Wang S, Lu Z, Zhang Y. Breast cancer diagnosis: a systematic review. Biocybernetics and Biomedical Engineering. 2024;44(1):119–48.
- 11. Xu C, Yi K, Jiang N, Li X, Zhong M, Zhang Y. MDFF-Net: a multi-dimensional feature fusion network for breast histopathology image classification. Comput Biol Med. 2023;165:107385. pmid:37633086
- 12. George K, Sankaran P, K PJ. Computer assisted recognition of breast cancer in biopsy images via fusion of nucleus-guided deep convolutional features. Comput Methods Programs Biomed. 2020;194:105531. pmid:32422473
- 13. Li X, Shen X, Zhou Y, Wang X, Li T-Q. Classification of breast cancer histopathological images using interleaved DenseNet with SENet (IDSNet). PLoS One. 2020;15(5):e0232127. pmid:32365142
- 14.
Joseph N, Gupta R. Dual multi-scale CNN for multi-layer breast cancer classification at multi-resolution. In: 2022 4th International Conference on Advances in Computing, Communication Control and Networking (ICAC3N). 2022. p. 613–8. https://doi.org/10.1109/icac3n56670.2022.10073985
- 15. Li M, Zhang B, Sun J, Zhang J, Liu B, Zhang Q. Weakly supervised breast cancer classification on WSI using transformer and graph attention network. Int J Imaging Syst Tech. 2024;34(4):e23125.
- 16. Hao S, Jia Y, Liu J, Wang Z, Liu C, Ji Z, et al. ST-Double-Net: a two-stage breast tumor classification model based on swin transformer and weakly supervised target localization. IEEE Access. 2024;12:117921–33.
- 17. Gao C, Sun Q, Zhu W, Zhang L, Zhang J, Liu B, et al. Transformer based multiple instance learning for WSI breast cancer classification. Biomedical Signal Processing and Control. 2024;89:105755.
- 18. Wang S, Rong R, Zhou Q, Yang DM, Zhang X, Zhan X, et al. Deep learning of cell spatial organizations identifies clinically relevant insights in tissue images. Nat Commun. 2023;14(1):7872. pmid:38081823
- 19. Abbasniya MR, Sheikholeslamzadeh SA, Nasiri H, Emami S. Classification of breast tumors based on histopathology images using deep features and ensemble of gradient boosting methods. Computers and Electrical Engineering. 2022;103:108382.
- 20. Ogundokun RO, Owolawi PA, Tu C. Optimized deep feature learning with hybrid ensemble soft voting for early breast cancer histopathological image classification. Computers, Materials & Continua. 2025;84(3).
- 21. Maleki A, Raahemi M, Nasiri H. Breast cancer diagnosis from histopathology images using deep neural network and XGBoost. Biomedical Signal Processing and Control. 2023;86:105152.
- 22. He Z, Lin M, Xu Z, Yao Z, Chen H, Alhudhaif A, et al. Deconv-transformer (DecT): a histopathological image classification model for breast cancer based on color deconvolution and transformer architecture. Information Sciences. 2022;608:1093–112.
- 23. Zou Y, Chen S, Che C, Zhang J, Zhang Q. Breast cancer histopathology image classification based on dual-stream high-order network. Biomedical Signal Processing and Control. 2022;78:104007.
- 24. Liu L, Wang Y, Zhang P, Qiao H, Sun T, Zhang H, et al. Collaborative transfer network for multi-classification of breast cancer histopathological images. IEEE J Biomed Health Inform. 2024;28(1):110–21. pmid:37294651
- 25. Hou Y, Zhang W, Cheng R, Zhang G, Guo Y, Hao Y, et al. Meta-adaptive-weighting-based bilateral multi-dimensional refined space feature attention network for imbalanced breast cancer histopathological image classification. Comput Biol Med. 2023;164:107300. pmid:37557055
- 26.
Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). p. 886–93. https://doi.org/10.1109/cvpr.2005.177
- 27.
Newlin Shebiah R, Aruna Sangari A. Classification of human body parts using histogram of oriented gradients. In: 2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS). 2019. p. 958–61. https://doi.org/10.1109/icaccs.2019.8728328
- 28. Wei Y, Kun H, Sheng ZY. The study on polarized spectral identification of dry plants and bare soils based on histogram of oriented gradient. Journal of Infrared and Millimeter Waves. 2019;38(3):365–70.
- 29. Li J, Wang K, Jiang X. Robust multi-subtype identification of breast cancer pathological images based on a dual-branch frequency domain fusion network. Sensors (Basel). 2025;25(1):240. pmid:39797031
- 30. Yan Y, Lu R, Sun J, Zhang J, Zhang Q. Breast cancer histopathology image classification using transformer with discrete wavelet transform. Med Eng Phys. 2025;138:104317. pmid:40180530
- 31.
Tan J, Pei S, Qin W, Fu B, Li X, Huang L. Wavelet-based mamba with fourier adjustment for low-light image enhancement. In: Proceedings of the Asian Conference on Computer Vision; 2024. p. 3449–64.
- 32. Liu M, Liu Y, Xu P, Cui H, Ke J, Ma J. Exploiting geometric features via hierarchical graph pyramid transformer for cancer diagnosis using histopathological images. IEEE Trans Med Imaging. 2024;43(8):2888–900. pmid:38530716
- 33. Wu Z, Ding T, Lu Y, Pai D, Zhang J, Wang W. Token statistics transformer: linear-time attention via variational rate reduction. arXiv preprint 2024. https://arxiv.org/abs/241217810
- 34.
Zhu X, Hu H, Lin S, Dai J. Deformable convnets v2: more deformable, better results. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019. p. 9308–16.
- 35.
Liu H, Jia C, Shi F, Cheng X, Chen S. SCSegamba: lightweight structure-aware vision mamba for crack segmentation in structures. In: Proceedings of the Computer Vision and Pattern Recognition Conference. 2025. p. 29406–16.
- 36.
Golatkar A, Anand D, Sethi A. Classification of breast cancer histology using deep learning. In: International conference image analysis and recognition, 2018. p. 837–44.
- 37.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016. 770–8. https://doi.org/10.1109/cvpr.2016.90
- 38.
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, et al. Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2015. p. 1–9.
- 39. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint 2017. https://arxiv.org/abs/1704.04861
- 40. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems. 2012;25.
- 41.
Woo S, Debnath S, Hu R, Chen X, Liu Z, Kweon IS, et al. ConvNeXt V2: co-designing and scaling convnets with masked autoencoders. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2023. p. 16133–42. https://doi.org/10.1109/cvpr52729.2023.01548
- 42.
Tan M, Le Q. Efficientnetv2: Smaller models and faster training. In: International conference on machine learning. 2021. p. 10096–106.
- 43.
Howard A, Sandler M, Chen B, Wang W, Chen L-C, Tan M, et al. Searching for MobileNetV3. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 2019. p. 1314–24. https://doi.org/10.1109/iccv.2019.00140
- 44.
Ma N, Zhang X, Zheng HT, Sun J. Shufflenet v2: practical guidelines for efficient cnn architecture design. In: Proceedings of the European conference on computer vision (ECCV); 2018. p. 116–31.
- 45. Dosovitskiy A. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint 2020. https://arxiv.org/abs/2010.11929
- 46. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint 2014. https://arxiv.org/abs/1409.1556
- 47.
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z. Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021. p. 10012–22.
- 48. Maleki A, Raahemi M, Nasiri H. Breast cancer diagnosis from histopathology images using deep neural network and XGBoost. Biomedical Signal Processing and Control. 2023;86:105152.
- 49. Taheri S, Golrizkhatami Z, Basabrain AA, Hazzazi MS. A comprehensive study on classification of breast cancer histopathological images: binary versus multi-category and magnification-specific versus magnification-Independent. IEEE Access. 2024;12:50431–43.
- 50. Yan T, Chen G, Zhang H, Wang G, Yan Z, Li Y, et al. Convolutional neural network with parallel convolution scale attention module and ResCBAM for breast histology image classification. Heliyon. 2024;10(10):e30889. pmid:38770292
- 51. Li W, Long H, Zhan X, Wu Y. MDAA: multi-scale and dual-adaptive attention network for breast cancer classification. SIViP. 2024;18(4):3133–43.
- 52. Khan SUR, Zhao M, Asif S, Chen X, Zhu Y. GLNET: global–local CNN’s-based informed model for detection of breast cancer categories from histopathological slides. J Supercomput. 2023;80(6):7316–48.
- 53. Maurya R, Pandey NN, Dutta MK, Karnati M. FCCS-Net: breast cancer classification using multi-level fully convolutional-channel and spatial attention-based transfer learning approach. Biomedical Signal Processing and Control. 2024;94:106258.
- 54.
Ma S, Yang B, Tian G, Li X. Classification of breast cancer pathological images combining fine-grained region location. In: Fourteenth International Conference on Graphics and Image Processing (ICGIP 2022). 2023. p. 159. https://doi.org/10.1117/12.2680490
- 55. Li X, Shen X, Zhou Y, Wang X, Li T-Q. Classification of breast cancer histopathological images using interleaved DenseNet with SENet (IDSNet). PLoS One. 2020;15(5):e0232127. pmid:32365142
- 56. Jiang Y, Chen L, Zhang H, Xiao X. Breast cancer histopathological image classification using convolutional neural networks with small SE-ResNet module. PLoS One. 2019;14(3):e0214587. pmid:30925170
- 57. Li J, Wang K, Jiang X. Robust multi-subtype identification of breast cancer pathological images based on a dual-branch frequency domain fusion network. Sensors (Basel). 2025;25(1):240. pmid:39797031