Figures
Abstract
Rolling bearings are the core transmission components of large-scale rotating machinery such as wind power gearboxes and aviation engines, so timely and effective monitoring and diagnosis of their status are crucial to ensure the stable operation of equipment, reduce maintenance costs, and improve production efficiency. However, the noise interference in the industrial field often hides the original characteristics of the bearing fault signal, leading to the deep learning-based fault diagnosis model’s lack of diagnostic reliability in the strong industrial noise background. To address this problem, this paper proposes a multi-domain collaborative denoising diagnostic model based on dynamic inter-domain attention mechanism and noise-aware loss function. First, the model extracts high-dimensional features of bearing fault signals from multiple domains, such as time and frequency domains, aiming to enhance the richness and diversity of high-dimensional features to effectively suppress noise interference on the diagnostic results. Second, the dynamic inter-domain attention mechanism (DIDAM) is proposed, aiming to distinguish the importance of information in different signal domains and flexibly integrate them to realize more efficient and accurate multi-domain information fusion. Finally, the noise-aware loss function (NALF) is designed to avoid the phenomenon of the conduction model being prone to making wrong decisions due to excessive noise. Experimental results on two publicly available datasets, CWRU and MFPT, show that even in the extreme noise environment with SNR = –10 dB, the proposed model still achieves 81.25% and 76.36% fault diagnosis accuracies, which are better than most existing mainstream denoising models. Overall, the proposed method can still perform well under substantial noise interference, providing a new idea for intelligent bearing fault diagnosis in real industrial scenarios.
Citation: Cao W, Zhang L (2025) A multi-domain collaborative denoising bearing fault diagnosis model based on dynamic inter-domain attention mechanism and noise-aware loss function. PLoS One 20(6): e0326666. https://doi.org/10.1371/journal.pone.0326666
Editor: Burak Erkayman, Ataturk University, TÜRKIYE
Received: March 18, 2025; Accepted: June 3, 2025; Published: June 26, 2025
Copyright: © 2025 Cao, Zhang. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The CWRU and MFPT datasets used in this study are available on Figshare: the CWRU dataset can be downloaded from https://10.6084/m9.figshare.28606778, and the MFPT dataset can be downloaded from http://10.6084/m9.figshare.28606802.
Funding: This research was supported by the Research and Innovation Team Project of Neijiang Normal University (No. 18TD02). We confirm that the funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Rolling bearings are widely used to support rotating shafts and their components due to their strong load-bearing capacity, high motion accuracy, and low friction loss, and therefore are known as the “joints” of rotating machinery and equipment [1]. However, the working environment of rolling bearings is usually more complicated, making them very prone to failure, which may lead to equipment downtime and even cause serious safety accidents [2–4]. In conclusion, the condition of rolling bearings directly affects the operating efficiency, reliability, and safety of mechanical systems, so timely and effective monitoring and diagnosis are essential to ensure the regular operation of equipment [5, 6].
Signal processing and machine learning are two common strategies in bearing fault diagnosis. For instance, Wang et al. [7] proposed a fault diagnosis method based on feature vector migration learning. This method involves feature extraction from vibration signals through statistics and wavelet decomposition, then using the ReliefF algorithm to evaluate and select sensitive fault feature sets. Salunkhe et al. [8] pointed out that the dynamic characteristics of rolling bearings are mainly influenced by their geometric structure and operating parameters, and they developed a bearing fault diagnosis method based on the Hilbert-Huang transform. Thamba et al. [9] developed a new method for analyzing the fault patterns of self-aligning bearings and achieved automatic fault classification through artificial neural networks and deep neural networks. Goyal et al. [10] denoised the signals using the Hilbert transform, then applied principal component analysis for dimensionality reduction, followed by the sequential floating forward selection to screen the optimal feature set, and finally, these features were input into support vector machines and artificial neural networks for fault recognition tasks.
Signal processing-based methods often rely on prior understanding or assumptions about signal features and may not effectively extract relevant features when dealing with nonlinear and non-stationary signals, thus affecting the accuracy of fault diagnosis. Although machine learning-based methods can automatically extract features, manual feature engineering is still required in many cases, which demands professional knowledge, is time-consuming, and inappropriate feature selection may affect the model’s performance.
In recent years, with the continuous upgrade of computer hardware and the increasing maturity of cutting-edge theories such as artificial intelligence, deep learning has become the dominant force in bearing diagnosis research due to its end-to-end training model and outstanding feature representation capabilities [11]. Huang et al. [12] pointed out that traditional Convolutional Neural Networks (CNNs) are prone to falling into local optima when the input signals lack practical information. To address this issue, they proposed the Multi-scale Cascaded Convolutional Neural Network (MC-CNN), which enhances the diversity of signal representation in the frequency domain by introducing filters of different scales, thereby providing more valuable information for fault diagnosis. He et al. [13] believe that the fixed convolution kernel parameters in traditional CNNs limit their ability to extract key features from fault signals. Therefore, they introduced IMSCNN, which uses dilated convolution kernels with different dilation rates to extract multi-scale features, thereby improving the effectiveness of bearing fault diagnosis. Song et al. [14] proposed the strategy of widening convolution kernels to expand the receptive field and designed the WKCNN network structure based on this idea. Experimental results show that WKCNN outperforms other methods regarding diagnostic accuracy and timeliness. Hou et al. [15] proposed Diagnosisformer, an attention-based multi-feature parallel fusion model for rolling bearing fault diagnosis. This method extracts frequency domain features through the Fast Fourier Transform (FFT) and designs a multi-feature fusion module to capture more information from different receptive fields, extract essential features, and maintain global dependency relationships. Guo et al. [16] proposed an end-to-end fault diagnosis method that combines Attention CNN with a Bidirectional Long Short-Term Memory network (BiLSTM) (ACNN-BiLSTM). This method extracts short-term spatiotemporal features through one-dimensional wide convolution and introduces a convolution block attention module to adjust the weights of different feature dimensions dynamically, finally inputting the weighted features into BiLSTM for fault diagnosis. Liao et al. [17] proposed the MSRN-EGRU model, a hybrid deep learning model that combines a Multi-scale Residual Neural Network (MSRN) with an Enhanced Gated Recurrent Unit (EGRU). Experimental results show the model’s accuracy, robustness, and convergence in bearing fault diagnosis.
However, the studies above did not consider the noisy and harsh working conditions bearings face in industrial environments. In industrial applications, bearings often operate against a backdrop of strong noise, such as high-speed operation of equipment, external environmental noise, and vibrations from other mechanical components [18–20]. These noise interferences can mask the original characteristics of the bearing fault signal, making it difficult to accurately identify fault types, significantly affecting the accuracy and robustness of deep learning-based fault diagnosis methods [21, 22].
Researchers have proposed a series of anti-noise models to address noise interference in bearing fault diagnosis. Li et al. [23] have proposed a clustering algorithm based on deep convolutional neural networks, which reduces the discrepancy between training and testing data by minimizing intra-cluster variance and maximizing inter-cluster variance, thereby enhancing the model’s robustness under noise and varying working conditions. Chen et al. [24] introduced the MCNN-LSTM model, which uses multi-scale convolution kernels to extract high-frequency and low-frequency components of vibration signals and identifies the type of fault through a Long Short-Term Memory network (LSTM). Hakim et al. [25] transformed the signal from the time domain to the frequency domain using Fast Fourier Transform (FFT), separated the amplitude and phase using phase representation, and then inputted it into a 1D-CNN to enhance the model’s anti-noise capability. Li et al. [26] proposed an end-to-end Adaptive Multi-scale Full Convolutional Network (AMFCN), which improves adaptability to noise through random sampling and employs large convolution kernels for wide-range temporal feature extraction, effectively suppressing high-frequency noise and significantly improving the model’s anti-noise performance and robustness, outperforming traditional CNNs and other multi-scale CNN models. Wang et al. [27] designed an Adaptive Denoising Convolutional Neural Network (ADCNN), which removes noise through adaptive denoising units while preserving key fault features, and further enhances anti-noise capability by reducing the number of channels and increasing the size of convolution kernels. Han et al. [28] proposed a Deep Residual Multi-scale Convolutional Neural Network with an attention mechanism (Attention-MSCNN), which effectively removes noise through residual connections and a denoising multi-head attention mechanism and captures the relationships in long-term sequences. Gao et al. [29] proposed an Adaptive Global-Local Denoising Multi-Time Scale Attention Residual Shrinkage Network (AMARSN), which enhances discriminability under noise through a multi-time scale attention learning module and removes noise from multi-scale fault features through an adaptive denoising module.
Although existing denoising models have made some progress, they still have many shortcomings when faced with strong noise environments. Firstly, most methods focus solely on processing time-domain signals, neglecting key information contained in other potential domains. This results in the model’s difficulty in accurately extracting the original features of fault signals under strong background noise, limiting its ability to express high-dimensional features of different fault patterns. Secondly, although some methods attempt to extract signal features from different domains, they rely solely on fixed feature concatenation or fusion approaches, failing to fully address the differential contributions of features from different signal domains to model performance, leading to poor multi-source information fusion. Lastly, existing denoising models typically use the standard cross-entropy function for network parameter updates, neglecting the possibility that strong noise environments may lead the model to make incorrect decisions, resulting in deviations in the direction of gradient updates. Therefore, this paper proposes a multi-domain collaborative denoising bearing fault diagnosis model based on a dynamic inter-domain attention mechanism and a noise-aware loss function, with the following contributions:
- A multi-domain collaborative denoising bearing fault diagnosis model is proposed, which can simultaneously extract high-dimensional bearing fault signal features from different domains. The model enhances the richness and diversity of high-dimensional feature expression in the original signals, effectively improving the model’s fault recognition ability under substantial noise interference.
- Given the expressive differences and complementary relationships among multi-domain features, a dynamic inter-domain attention mechanism is designed to achieve more efficient information fusion. This mechanism can flexibly adjust the fusion approach according to the importance of information in each signal domain, improving the efficiency and accuracy of information fusion.
- Considering that excessive noise may lead the model to make incorrect decisions, resulting in deviations in the direction of gradient updates, a noise-aware loss function is constructed. This loss function can avoid deviations in the model’s gradient update direction caused by strong noise environments, thereby enhancing the stability and reliability of fault diagnosis.
Fundamental theories
This section elaborates on the three fundamental theories that underpin this research in terms of principles and functionality: the Fast Fourier Transform, the Squeeze-and-Excitation Attention Mechanism, and the Cross-Entropy Loss Function. These theories provide crucial theoretical support and inspiration for the model proposed in this paper and facilitate a deeper understanding of its design and implementation.
Fast Fourier transform
In the field of fault diagnosis for rotating machinery, time-domain vibration signals often exhibit nonlinear and non-stationary characteristics, and their transient impact components are easily disrupted by background noise. Traditional time-domain analysis methods are sensitive to random noise, making extracting periodic fault features stably challenging.
As a core tool in signal processing, the Discrete Fourier Transform (DFT) [30] can convert a discrete signal from the time domain to the frequency domain, thereby extracting frequency domain features and providing mathematical support for fault diagnosis in noisy environments. Specifically, for a discrete signal x(n) of length N, its DFT is defined as follows:
Here, X(k) represents the spectral value of the signal at the k-th discrete frequency point, and is the rotation factor (an element of the rotation matrix), which can also be expressed as:
However, directly computing the DFT requires N2 complex multiplications and additions, which leads to a substantial computational burden for large N. The Fast Fourier Transform (FFT) [31] is an efficient algorithm for computing the DFT. It employs a divide-and-conquer strategy, drastically reducing the computational cost from O(N2) to O(NlogN) and improving efficiency.
Specifically, the FFT decomposes a DFT of length N into two DFTs of length N/2, thereby splitting the summation of a DFT of length N into odd and even parts, as follows:
By utilizing the periodic nature of the rotation factor, it can be deduced that:
Substituting Eq 4 into Eq 3 yields:
Here, represents the DFT computed for the even-indexed terms, and
represents the DFT computed for the odd-indexed terms. In this manner, the computation of a DFT of length N is split into two DFTs of length N/2, combined with a complex multiplication. This recursive process leads to direct computations when N is reduced to 2.
Squeeze-and-excitation attention mechanism
The attention mechanism mimics the selective processing of human beings to external information, aiming to focus on the key areas of input data, allowing the model to highlight important information adaptively.
The Squeeze-and-Excitation (SE) attention mechanism proposed by Hu et al. [32] recalibrates the importance of feature channels through global information squeezing (Squeeze) and channel-wise weight excitation (Excitation), thereby enhancing the network’s ability to focus on key features. As shown in Fig 1, the SE mechanism consists of three steps: Squeeze, Excitation, and Scale.
Specifically, the Squeeze operation extracts the global spatial information from each channel through global pooling and produces a channel descriptor vector ZC, which is computed as follows:
In Eq 6, UC signifies the c-th channel (with a dimension of ) of the input feature map
. Subsequently, ZC is fed into the Excitation module, which consists of two fully connected layers. This module learns the intricate dependencies between channels through dimensionality reduction and expansion and produces the weight values S for each channel. The computation is illustrated as follows:
represents the first fully connected layer, which compresses the number of channels from C to
.
is the second fully connected layer responsible for restoring the number of channels back to C.
and
denote the ReLU activation and Sigmoid functions, respectively. Finally, the Scale operation applies the learned channel weights to the original feature map, adjusting the importance of each channel. The formula is as follows:
represents the feature map after U has been adjusted with weights, enhancing the response of important feature channels while diminishing the impact of less significant channels.
Cross-entropy loss function
In deep learning, the essence of training neural networks is optimizing their weight parameters to suit specific tasks, and the loss function serves as a powerful tool for updating the backpropagation and parameter adjustment of neural networks.
Most bearing fault diagnosis models based on deep learning use the cross-entropy loss function to measure the discrepancy between the model’s predicted probability distribution and the true distribution. The cross-entropy loss [33] originates from information entropy and the Kullback-Leibler (KL) divergence, and its definition is as follows:
In Eq 9, C denotes the total number of categories, yi represents the actual label of the i–th sample (encoded with one-hot encoding), and pi indicates the probability that the model predicts the i–th sample as belonging to the proper category.
During the model training process, by minimizing the cross-entropy loss, the discrepancy between the model’s predicted probability distribution and the actual label distribution is gradually reduced, enhancing the model’s classification accuracy and generalization capability.
Proposed methods
This paper proposes a multi-domain collaborative denoising bearing fault diagnosis model based on a dynamic inter-domain attention mechanism (DIDAM) and noise-aware loss function (NALF), which utilizes both time-domain and frequency-domain signals as inputs, aiming to enhance the model’s robustness and fault diagnosis performance in noisy environments.
Multi-domain collaborative denoising diagnosis model
As shown in Fig 2, the proposed model consists of four core modules: data preprocessing, feature extraction, multi-domain feature fusion, and classification decision. Specifically, the components and their implementation steps are as follows:
Data preprocessing: Firstly, the bearing vibration signals (time-domain signals) collected by sensors are divided into several equal-length signal sequences of 1024. Then, these sequences are transformed into the frequency domain through the Fast Fourier Transform (FFT), and the data in both domains are normalized separately to eliminate dimensional discrepancies between different signals.
Feature extraction: A dual-branch parallel structure is constructed, with each branch containing five cascaded convolution-pooling operations. The time-domain branch focuses on extracting local temporal pattern features from vibration waveforms. In contrast, the frequency-domain branch aims to capture deep spectral energy distribution patterns, ultimately forming two sets of high-dimensional feature tensors.
Multi-domain feature fusion: To achieve efficient fusion of time-domain and frequency-domain information, a Dynamic Inter-Domain Attention Fusion mechanism (DIDAM) is designed. This mechanism can dynamically adjust the fusion method based on the feature importance of time-domain and frequency-domain signals, thus achieving flexible weighted fusion of information and enhancing the collaborative expression ability of cross-domain features.
Classification decision: The fused feature vectors are integrated through fully connected layers and then input into a softmax classifier for bearing fault signal classification. Moreover, to improve the model’s applicability in industrial noise environments, a Noise-Aware Loss Function (NALF) is designed for model training and parameter updating, further enhancing classification accuracy.
For a more intuitive understanding of the specific architecture and parameter settings of the model proposed in this paper, Table 1 presents its detailed structure and parameter configuration.
Dynamic inter-domain attention mechanism
A Dynamic Inter-Domain Attention Mechanism (DIDAM) is proposed to effectively capture key information in time-domain and frequency-domain signals and enhance the collaborative expression capability of cross-domain features. This mechanism is constructed by cascading an improved channel attention module with a novel spatial attention module, which dynamically weights the time-domain and frequency-domain features to achieve efficient information fusion.
In response to the Squeeze operation in SENet, which only employs global average pooling to extract channel descriptor vectors, leading to relatively monotonous information and limiting the diversity of feature channels and the comprehensiveness of information, we introduce global standard deviation pooling while retaining the original global average pooling. This aims to better aggregate channel information, thereby enhancing the expressive capability of the channel attention mechanism. The improved channel attention mechanism is shown in Fig 3, where denotes the newly added global standard deviation pooling operation. The mathematical expression for this operation is as follows:
Next, the sum operation is performed on KC and ZC to obtain the final channel description vector ZKC. The other operations remain consistent with SENet and will not be altered. The formula is as follows:
In Eq 11, the symbol denotes the vector summation operation. Although traditional spatial attention mechanisms can prompt the model to focus on key areas within the feature map, their fixed convolutional windows or weighting mechanisms may overlook the nonlinear relationships between regions, failing to attend to the subtle associations between different areas effectively.
Therefore, this paper proposes a novel spatial attention mechanism, as illustrated in Fig 4. Firstly, three 11 convolution operations are performed on the channel-weighted feature map
, resulting in three feature maps:
,
, and
. The formulas are as follows:
Next, the Q, G, and K feature maps are each subjected to a reshape operation to obtain ,
, and
. The matrix multiplication is then performed between Q and G, and the resulting weights are normalized using the Softmax function to obtain
, which represents the spatial attention map. The formula is as follows:
In Eq 13, a higher value of M indicates a more remarkable similarity between the two regions. The obtained M is multiplied by K, and the resulting product is reshaped into a new feature map of dimensions .
Finally, V is fused with the input feature to obtain the final output feature map
. The mathematical formula is as follows:
In summary, by reshaping the feature maps and performing multiple matrix operations on the similarities between regions, the spatial attention module in DIDAM can more effectively handle the nonlinear relationships between regions. Thus, it can capture more complex spatial dependencies and avoid the issue of neglecting fine-grained spatial information in traditional methods.
Noise-aware loss function
When used for bearing fault diagnosis, the cross-entropy loss assigns the same loss weight to all samples without considering their difficulty level. This approach may lead to the cumulative loss of many simple samples, overshadowing the contribution of complex samples and causing the gradient updates to be biased towards simple samples. Additionally, the cross-entropy loss is sensitive to outliers. When there are outliers or noise in the dataset, the model may overly focus on these points, thus affecting its generalization ability.
Therefore, this paper proposes a Noise-Aware Loss Function (NALF). Specifically, since the performance of deep learning models becomes gradually robust with the increase in training epochs, a threshold is introduced, which increases as the number of training epochs increases. The formula is as follows:
In Eq 15, N represents the number of sample categories, Es represents the total number of training epochs, and E represents the current training epoch. During each training epoch, the model compares the probability value of each sample being predicted correctly with the threshold . If the value is greater than
, it is considered a correct prediction and is assigned a lower weight; otherwise, it is assigned a higher weight. The formula is as follows:
Here, Pr represents the probability value the model correctly predicts each sample in the current round. Wl and Wh denote the lower and higher weights, respectively, with their values set according to reference [34]. The mathematical expression for the Noise-Aware Loss Function (NALF) is as follows:
This design enables the model to update its parameters based on the actual prediction for each sample. In other words, it can effectively reduce the influence of prediction errors caused by noise. When the model’s predictions are correct, it learns more reliable decision rules, thereby mitigating the adverse effects of noise on the model.
Experimental validation and results discussion
Experimental setup and noise simulation
All experiments in this chapter were conducted on a computing platform based on the Windows 11 operating system, equipped with a 12th Generation processor and an NVIDIA RTX 4090 graphics card. Algorithm development was implemented in Python, relying on the PyCharm integrated development environment and the PyTorch deep learning framework. During the model training process, the Adam optimizer was used, with an initial learning rate set to 0.001, a batch size of 128, and a total of 100 epochs to ensure the stability and efficiency of model convergence.
Given that Gaussian white noise is statistically similar to various random noises in natural environments, the experiments will be conducted against backgrounds of Gaussian white noise with different intensities. Specifically, this study selects six different Signal-to-Noise Ratio (SNR) levels, including –10 dB, –8 dB, –6 dB, –4 dB, –2 dB, and 0 dB, to evaluate the performance of the proposed model under varying noise conditions. The formula for calculating the SNR is as follows:
Here, Ps and Pn represent the signal and noise power, respectively, and dB denotes decibels. A higher Signal-to-Noise Ratio (SNR) indicates that the proportion of the signal relative to the noise is greater, meaning the signal is purer; conversely, a lower SNR suggests a higher proportion of noise, and the signal is more severely interfered with.
Dataset sescription
CWRU Dataset [35]: This dataset was collected at a sampling rate of 12 kHz under four load conditions (0, 1, 2, and 3 horsepower). It includes four types of faults: healthy (H), inner race fault (IF), outer race fault (OF), and rolling element fault (BF). Each fault type is further categorized into three damage sizes: 0.007, 0.014, and 0.021 inches. Details of the CWRU dataset used in this study are shown in Table 2.
MFPT Dataset [36]: This dataset consists of 23 categories. The basic group includes three sets of normal and three sets of outer race fault data collected under a load of 270 lbs, a shaft speed of 25 Hz, and a sampling rate of 97656 kHz. The extended group contains two variable-load subsets: seven sets of outer race fault data (collected at loads ranging from 25 to 300 lbs with a sampling rate of 48828 kHz) and seven sets of inner race fault data (collected at loads ranging from 0 to 300 lbs with a sampling rate of 48828 kHz). Details of the MFPT dataset used in this study are shown in Table 3.
Fault diagnosis under different noise environments
This subsection analyzes the fault diagnosis performance of the proposed model under different signal-to-noise ratio (SNR) conditions. As shown in Fig 5(a), on the CWRU dataset, the diagnosis accuracy reaches 100 when SNR = 0 dB. The accuracy gradually declines as the SNR decreases (i.e., noise increases). However, it remains above 96.91
for SNR
–4 dB, indicating that the model exhibits strong noise resistance in high-SNR environments. When the SNR drops to –6 dB and –8 dB, the accuracy decreases to 94.58
and 89.62
, respectively, demonstrating that despite increased noise interference, the model can still accurately identify various fault types. Even in an extreme noise environment (SNR = –10 dB), the accuracy remains at 81.25
, proving that the model can effectively extract fault features under severe noise conditions, further highlighting its superior noise robustness.
As shown in Fig 5(b), on the MFPT dataset, the proposed model achieves a fault diagnosis accuracy of 96.74 at SNR = 0 dB, slightly lower than the 100
accuracy observed on the CWRU dataset. As the SNR decreases, the fault recognition rate also declines. However, it remains above 90
for SNR
–6 dB, demonstrating that the model can still stably identify fault features under moderate noise levels. Even in the extreme case of SNR = –10 dB, the accuracy remains at 76.36
, significantly higher than random guessing, indicating that the model can still effectively extract fault features and maintain strong noise resistance. Additionally, compared to the CWRU dataset, the accuracy of the MFPT dataset is slightly lower under the same SNR conditions, which may be attributed to the lower signal quality and more significant background noise interference in the MFPT dataset. Notably, in both datasets, the accuracy of the proposed model does not exhibit a drastic drop, suggesting that it can maintain stable performance not only in high-SNR environments but also under low-SNR conditions, fully demonstrating its superior noise robustness.
Additionally, we plotted the confusion matrix to visually present the model’s recognition of different fault types under various SNR conditions. Since the accuracy of the model on the CWRU dataset reached 100 when SNR = 0 dB, the confusion matrix is not shown for this case. As shown in Fig 6, when SNR
–6 dB, most misclassifications occur in Category 2, which is often misidentified as Category 0 or Category 4. This suggests that the features of Category 2 lack sufficient discriminative power under low SNR conditions, causing its features to spread across multiple categories. When the SNR further drops to –8 dB, in addition to the misclassification of Category 2, Category 0 is also misidentified as Category 2, indicating that the features of Category 0 and Category 2 become more similar under the influence of noise. This may be due to their close time-frequency features, with the noise masking their subtle differences. When the SNR decreases to –10 dB, almost all categories experience severe misclassification, indicating that extreme noise leads to extreme instability in category features, further weakening the model’s discriminative ability.
To further validate this phenomenon, we performed t-SNE dimensionality reduction and visualization analysis on the feature distributions under different SNR conditions, as shown in Fig 7. The results indicate that, on the CWRU dataset, the clustering of samples from different categories is relatively good under high SNR conditions. As the SNR decreases, the distribution of points becomes increasingly dispersed, and the category boundaries become more blurred. Specifically, when SNR –8 dB, Categories 0, 2, and 4 gradually move closer in the t-SNE plot and eventually overlap. When SNR = –10 dB, most category sample points intersect, making the categories nearly inseparable. This trend corresponds with the misclassification results from the confusion matrix, further confirming the impact of noise on fault feature extraction.
On the MFPT dataset, we also plotted the confusion matrix and t-SNE dimensionality reduction visualization results (Figs 8 and 9). The confusion matrix shows that when SNR –4 dB, the main misclassification occurs between Categories 0, 1, and 2, indicating that these categories have similar features under low SNR conditions. When the SNR drops to –6 dB and below, misclassifications between Categories 0, 1, 2, 4, and 5 significantly increase, suggesting that as noise interference intensifies, the feature distinguishability between different categories decreases. From the t-SNE dimensionality reduction visualization, compared to the CWRU dataset, the distribution of sample points for each category on the MFPT dataset is more dispersed, further confirming that the signal quality of the MFPT dataset is poorer and that the noise impact is more significant.
Comparison with the state-of-the-art methods
To further verify the noise resistance of the proposed model, we compared its performance with six state-of-the-art denoising models (including MCNN-LSTM [24], AMFCN [26], Mel-CNN [37], DRSN [38], IDRSN [39], and RDDAN [40]) under different SNR conditions, as shown in Table 4. As the SNR decreases (i.e., noise level increases), the diagnostic accuracy of all models declines—an expected trend commonly observed in denoising tasks. Specifically, under high SNR conditions (SNR –4 dB), the proposed model outperforms most existing methods, ranking only behind RDDAN and AMFCN. In the lower SNR range (–4
SNR
–10 dB), its diagnostic accuracy is second only to RDDAN. In boisterous environments (SNR = –8 dB and SNR = –10 dB), the performance advantage of the proposed model becomes more pronounced. It is emphasized that the proposed model shows the least degradation in performance as the SNR ratio continues to decrease, fully demonstrating its superior noise robustness. In summary, the proposed model delivers outstanding fault diagnosis performance under various noise levels, particularly with extreme robustness in high-noise environments, indicating its high practical value for real-world applications.
To more intuitively demonstrate the performance differences among various models under strong noise conditions, we selected SNR = –8 dB ( high noise intensity and covers a comprehensive range of comparison methods) to plot their confusion matrices and t-SNE dimensionality reduction visualizations on the CWRU test set, as shown in Figs 10 and 11.
From the confusion matrices, it is evident that models such as DRSN, AMFCN, Mel-CNN, and IDRSN have significant classification errors across multiple categories, primarily between categories 1, 2, 5, and 8. This may be due to the limited noise resistance of these methods, which makes it difficult to effectively distinguish between these inherently similar fault patterns under high noise interference. In contrast, our model only has a few misclassifications on categories 0, 1, and 2, with most errors being the misidentification of category 2 as category 0. This phenomenon may stem from the substantial similarity between categories 0 and 2 in the feature space, which is more easily obscured by intense noise. It is worth noting that although the overall recognition accuracy of the proposed model is slightly lower than that of RDDAN, its misclassifications are mainly between adjacent categories and are relatively concentrated, which to some extent reflects its strong ability to model feature boundaries and maintain high fault recognition capabilities even in high-noise environments.
The t-SNE dimensionality reduction visualization results shown in Fig 11 further corroborate this conclusion. It is visible that DRSN, AMFCN, Mel-CNN, and IDRSN have significant overlapping regions among categories 1, 2, 5, and 8, indicating their difficulty in achieving adequate category distinction under strong noise. In contrast, the feature distributions extracted by our method and RDDAN are much clearer, with distinct boundaries between different categories and almost no overlap, which is highly consistent with the confusion matrix results. This further demonstrates the proposed model’s superior feature representation capability and good inter-class separability under complex noise conditions.
Ablation experiment
Finally, to analyze the impact of the Dynamic Inter-domain Attention Mechanism (DIDAM) and the Noise-Aware Loss Function (NALF) on the performance of the proposed model, we conducted an ablation study on the CWRU dataset, as shown in Fig 12. All models in this experiment follow the framework described in the "Multi-Domain Collaborative Denoising Diagnosis Model," with the specific experimental settings as follows:
Model 1: DIDAM is not used, and the loss function is cross-entropy.
Model 2: DIDAM is used, and the loss function is cross-entropy.
Model 3: DIDAM is not used, and the loss function is NALF.
Model 4: The proposed model (DIDAM + NALF).
The experimental results show that Model 2 outperforms Model 1 under all SNR conditions, particularly at low SNR levels (SNR = –8 dB and SNR = –10 dB), demonstrating that DIDAM significantly enhances the model’s robustness. Similarly, Model 3 performs better than Model 1 in most SNR conditions, especially in low-SNR environments (e.g., SNR = –8 dB and SNR = –10 dB), indicating that NALF helps improve the model’s noise resistance. Notably, Model 4 surpasses all three baseline models across all SNR conditions, achieving an accuracy of 81.25 even in extreme noise conditions (SNR = –10 dB). This result confirms that combining DIDAM and NALF can significantly enhance the model’s noise robustness and overall diagnostic accuracy. In summary, both DIDAM and NALF contribute to performance improvement through different mechanisms, and their combination further amplifies this effect, resulting in a synergistic enhancement.
Discussion
Noise interference in industrial scenarios often masks the original characteristics of bearing fault signals, leading to insufficient diagnostic reliability of fault diagnosis models based on deep learning. Although existing denoising models have achieved some success, they still have certain limitations:
- Most methods focus exclusively on processing time-domain signals, neglecting key information contained in other potential domains. This makes the model’s accuracy in extracting the original features of fault signals under intense background noise difficult, limiting its ability to express high-dimensional features of different fault patterns.
- Some methods have begun to attempt feature extraction from different domains. However, they typically rely on fixed feature concatenation or fusion methods, failing to fully consider the differential contributions of features from different signal domains to model performance, leading to poor multi-source information fusion.
- Existing denoising models usually employ standard cross-entropy functions for network parameter updates, neglecting that strong noise environments may lead the model to make incorrect decisions, causing deviations in the direction of gradient updates.
This paper proposes a multi-domain collaborative denoising bearing fault diagnosis model based on a dynamic inter-domain attention mechanism and a noise-aware loss function. The model can simultaneously extract high-dimensional features of bearing fault signals from different domains, enriching and diversifying the expression of the original signal’s high-dimensional features and enhancing the model’s fault recognition capability under substantial noise interference. Additionally, a dynamic inter-domain attention mechanism is designed to flexibly adjust the fusion approach according to the importance of each signal domain, improving the efficiency and accuracy of information fusion. Finally, a noise-aware loss function is constructed, effectively avoiding deviations in the model’s gradient update direction caused by strong noise environments, thereby improving the stability and reliability of fault diagnosis.
Experimental results on the publicly available CWRU and MFPT datasets show that the proposed model exhibits superior diagnostic capabilities under different noise intensities. Even in an extreme noise environment with an SNR of –10 dB, the model achieved 81.25% and 76.36% diagnostic accuracy rates. Compared to most existing mainstream denoising models, the proposed model maintains high diagnostic precision under low SNR conditions (extreme noise environments), fully demonstrating its effectiveness.
Despite the promising experimental results of the proposed model, there are still some limitations. For example, the experiments in this paper are based on standardized datasets such as CWRU and MFPT. In contrast, the complex conditions in actual industrial settings (such as variable speed, variable load, and multi-source mixed noise) may lead to stronger time-variability in fault signals, resulting in decreased model generalization ability. The multi-domain feature extraction and dynamic inter-domain attention mechanism increase the computational burden, and the model may struggle to meet real-time diagnostic requirements in resource-constrained scenarios such as industrial edge devices. Moreover, the model relies on training data with known fault types and may perform poorly on unseen fault patterns (such as complex or early weak faults), lacking incremental learning capability.
In the future, a lightweight and more practical denoising bearing fault diagnosis model can be designed to reduce computational latency while maintaining precision. Additionally, introducing non-Gaussian noise, multi-source mixed noise, and varying condition data can enhance the model’s environmental adaptability. Furthermore, meta-learning or contrastive learning can be combined to improve the model’s ability to identify new faults with a small number of samples.
Conclusion
This paper proposes a multi-domain collaborative denoising bearing fault diagnosis model based on the dynamic inter-domain attention mechanism and noise-aware loss function. Specifically, the model can simultaneously extract high-dimensional features of bearing fault signals from different domains, aiming to enrich and diversify the expression of high-dimensional features in the original signals, thereby enhancing the model’s fault recognition ability under substantial noise interference. Secondly, a dynamic inter-domain attention mechanism (DIDAM) is introduced to effectively distinguish the importance of information from different signal domains and enhance the collaborative expression ability of cross-domain features. Lastly, a noise-aware loss function (NALF) is proposed for model training and parameter updating. It aims to effectively reduce the negative impact of noise on the model, allowing it to learn more reliable decision-making rules. Experimental results on two public datasets, CWRU and MFPT, show that compared to most existing mainstream denoising fault diagnosis models, the proposed model exhibits superior diagnostic capabilities under different noise intensities. Particularly in extreme noise environments, its anti-interference ability and robustness are effectively improved, demonstrating its effectiveness.
References
- 1. Cao H, Niu L, Xi S. Mechanical model development of rolling bearing-rotor systems: a review. Mech Syst Signal Process. 2018;102:37–58.
- 2. Zhang T, Liu S, Zhang S. Review on fault diagnosis on the rolling bearing. J Phys: Conf Ser. 2021;1820(1):012107.
- 3. Peng B, Bi Y, Xue B. A survey on fault diagnosis of rolling bearings. Algorithms. 2022;15(10):347.
- 4. Hakim M, Omran AAB, Ahmed AN. A systematic review of rolling bearing fault diagnoses based on deep learning and transfer learning: taxonomy, overview, application, open challenges, weaknesses and recommendations. Ain Shams Eng J. 2023;14(4):101945.
- 5.
Barai V, Ramteke SM, Dhanalkotwar V. Bearing fault diagnosis using signal processing and machine learning techniques: a review. In: IOP Conference Series: Materials Science and Engineering. 2022. 012034.
- 6. Chen X, Yang R, Xue Y, et al. Deep transfer learning for bearing fault diagnosis: a systematic review since 2016. IEEE Trans Instrument Measur. 2023;72:1–21.
- 7. Wang S, Wang Q, Xiao Y. Research on rotor system fault diagnosis method based on vibration signal feature vector transfer learning. Eng Failure Anal. 2022;139:106424.
- 8. Salunkhe VG, Khot SM, Desavale RG. Unbalance bearing fault identification using highly accurate Hilbert–Huang transform approach. J Nondestruct Eval Diagnost Prognost Eng Syst. 2023;6(3).
- 9. Thamba NB, Aravind A, Rakesh A. Application of EMD, ANN and DNN for self-aligning bearing fault diagnosis. Archiv Acoust. 2018;43(2):163–75.
- 10. Goyal D, Dhami SS, Pabla BS. Non-contact fault diagnosis of bearings in machine learning environment. IEEE Sens J. 2020;20(9):4816–23.
- 11. Hamadache M, Jung JH, Park J. A comprehensive review of artificial intelligence-based approaches for rolling element bearing PHM: shallow and deep learning. JMST Adv. 2019;1:125–51.
- 12. Huang W, Cheng J, Yang Y. An improved deep convolutional neural network with multi-scale information for bearing fault diagnosis. Neurocomputing. 2019;359:77–92.
- 13. He J, Wu P, Tong Y, Zhang X, Lei M, Gao J. Bearing fault diagnosis via improved one-dimensional multi-scale dilated CNN. Sensors (Basel). 2021;21(21):7319. pmid:34770636
- 14. Song X, Cong Y, Song Y. A bearing fault diagnosis model based on CNN with wide convolution kernels. J Ambient Intell Humaniz Comput. 2022;13(8):4041–56.
- 15. Hou Y, Wang J, Chen Z. Diagnosisformer: an efficient rolling bearing fault diagnosis method based on improved transformer. Eng Appl Artif Intell. 2023;124:106507.
- 16. Guo Y, Mao J, Zhao M. Rolling bearing fault diagnosis method based on attention CNN and BiLSTM network. Neural Process Lett. 2023;55(3):3377–410.
- 17. Liao W, Fu W, Yang K. Multi-scale residual neural network with enhanced gated recurrent unit for fault diagnosis of rolling bearing. Measur Sci Technol. 2024;35(5):056114.
- 18.
Rizescu CI, Constantin V, Rizescu D. Noise analyses for rolling bearings. In: Proceedings of the International Conference of Mechatronics and Cyber-MixMechatronics–2019. Springer; 2020. p. 74–81.
- 19. Yang C, Qiao Z, Zhu R. An intelligent fault diagnosis method enhanced by noise injection for machinery. IEEE Trans Instrument Measur. 2023;72:1–11.
- 20.
Chen P, Wu Y, Xu C, et al. Interference suppression of nonstationary signals for bearing diagnosis under transient noise measurements. IEEE Transactions on Reliability. 2025.
- 21. Pancaldi F, Dibiase L, Cocconcelli M. Impact of noise model on the performance of algorithms for fault diagnosis in rolling bearings. Mech Syst Signal Process. 2023;188:109975.
- 22. Meng Z, Qin X, Liu J. A denoising algorithm based on ARIMA and competitive K-SVD for the diagnosis of rolling bearing faults. Appl Acoust. 2025;228:110309.
- 23. Li X, Zhang W, Ding Q. A robust intelligent fault diagnosis method for rolling element bearings based on deep distance metric learning. Neurocomputing. 2018;310:77–95.
- 24. Chen X, Zhang B, Gao D. Bearing fault diagnosis based on multi-scale CNN and LSTM model. J Intell Manuf. 2021;32(4):971–87.
- 25. Hakim M, Omran AAB, Inayat-Hussain JI, Ahmed AN, Abdellatef H, Abdellatif A, et al. Bearing fault diagnosis using lightweight and robust one-dimensional convolution neural network in the frequency domain. Sensors (Basel). 2022;22(15):5793. pmid:35957359
- 26. Li F, Wang L, Wang D. An adaptive multiscale fully convolutional network for bearing fault diagnosis under noisy environments. Measurement. 2023;216:112993.
- 27. Wang Q, Xu F. A novel rolling bearing fault diagnosis method based on adaptive denoising convolutional neural network under noise background. Measurement. 2023;218:113209.
- 28. Han S, Sun S, Z Z. Deep residual multiscale convolutional neural network with attention mechanism for bearing fault diagnosis under strong noise environment. IEEE Sens J. 2024;24(6):9073–81.
- 29. Gao H, Zhang X, Gao X. Multi-timescale attention residual shrinkage network with adaptive global-local denoising for rolling-bearing fault diagnosis. Knowl-Based Syst. 2024;304:112478.
- 30. Wahab MF, Gritti F, O’Haver TC. Discrete Fourier transform techniques for noise reduction and digital enhancement of analytical signals. TrAC Trends Analyt Chem. 2021;143:116354.
- 31. Henry M. An ultra-precise fast Fourier transform. Measurement. 2023;220:113372.
- 32.
Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. p. 7132–41.
- 33.
Mao A, Mohri M, Zhong Y. Cross-entropy loss functions: theoretical analysis and applications. In: International Conference on Machine Learning. PMLR; 2023. p. 23803–28.
- 34.
Lin TY, Goyal P, Girshick R. Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision. 2017. p. 2980–8.
- 35. Smith WA, Randall RB. Rolling element bearing diagnostics using the Case Western Reserve University data: a benchmark study. Mech Syst Signal Process. 2015;64:100–31.
- 36.
Dataset M. Society for Machinery Failure Prevention Technology. 2020.
- 37. Shan S, Liu J, Wu S. A motor bearing fault voiceprint recognition method based on Mel-CNN model. Measurement. 2023;207:112408.
- 38. Zhao M, Zhong S, Fu X. Deep residual shrinkage networks for fault diagnosis. IEEE Trans Indust Inf. 2019;16(7):4681–90.
- 39. Tong J, Tang S, Wu Y. A fault diagnosis method of rolling bearing based on improved deep residual shrinkage networks. Measurement. 2023;206:112282.
- 40. Shi H, Gan C, Zhang X. A fault diagnosis method for rolling bearings based on RDDAN under multivariable working conditions. Measur Sci Technol. 2022;34(2):025003.