RGB-based visual encoding of vibration data for gearbox fault diagnosis using U-Net segmentation model

İrfan Kiliç; Gülşah Karaduman; Beyda Tasar; Orhan Yaman

doi:10.1371/journal.pone.0350838

Abstract

This study presents an innovative approach for diagnosing gearbox gear faults by enabling numerical vibration data analysis using image-based deep learning models. The Gearbox Fault Diagnosis Data set available on Kaggle was used to collect vibration signals from four different sensors (a1, a2, a3, a4). The maximum, minimum, and mean values of these signals were calculated and normalized within the [0–255] range and then mapped to the red, green, and blue (RGB) color channels, respectively. As a result, 500 images of 256 × 256 pixels were generated for each category. Then, these image representations were used to train a pre-trained U-Net deep learning model for segmentation, with only 10 training epochs. The model achieved a classification accuracy of 99.87% and an mean average precision (mAP) score of 99.74%. These high-performance metrics demonstrate that converting non-visual numerical data into RGB images and analyzing them using convolutional neural networks (CNNs) offers significant advantages over commonly used machine learning and text-based deep learning methods.To the best of our knowledge, this is the first study to classify numerical sensor data with such high accuracy by converting it into a visual format. The proposed method not only advances the field of gearbox fault detection and introduces a new paradigm for solving similar signal-based engineering problems in the literature.

Citation: Kiliç İ, Karaduman G, Tasar B, Yaman O (2026) RGB-based visual encoding of vibration data for gearbox fault diagnosis using U-Net segmentation model. PLoS One 21(6): e0350838. https://doi.org/10.1371/journal.pone.0350838

Editor: Carlos Alberto Cruz-Villar, CINVESTAV IPN: Centro de Investigacion y de Estudios Avanzados del Instituto Politecnico Nacional, MEXICO

Received: January 10, 2026; Accepted: May 19, 2026; Published: June 22, 2026

Copyright: © 2026 Kiliç et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The dataset may be accessed with the following link: https://www.kaggle.com/datasets/brjapon/gearbox-fault-diagnosis. The MATLAB code used in the study, which also contains the dataset, and other files generated by the code are publicly available on my GitHub page as shown below: https://github.com/irfankilic/RGB_GearboxDetection/tree/main.

Funding: This study was funded by the Firat University Scientific Research Project Coordination Office (FUBAP) under project number MF.25.103 with the Open Access (OA) publication fund. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

1. Introduction

Gearboxes are among the basic components in many industrial and precision applications in the industrial sector. With the impact of Industry 4.0, condition monitoring and fault diagnosis (AT) of rotating machines have gained importance [1–3]. These systems play a critical role in motion and power transmission in modern industrial mechanisms [2]. Spur gearboxes [4,5], helical gearboxes [6], bevel gearboxes [3,7], and planetary gearboxes [8] are preferred in different rotating machine applications because of their flexibility. Gearboxes play a central function in automation processes by increasing the mechanical system efficiency.

Due to harsh operating conditions and continuous loading, gearbox failures can occur in gearboxes [9]. Rolling and sliding movements; the sliding direction is usually in the opposite direction [10,11]. Insufficient lubrication damages the gear surfaces by increasing contact, temperature increase and wear on the surfaces. Tensile stress, surface properties, and the presence of defects in the gear roots affect the occurrence of failures [12,13]. Failures are generally classified as lubricated or unlubricated. Lubricated failures include problems such as insufficient lubrication and pitting, whereas unlubricated failures are associated with breakages due to excessive loads [10,14]. These failures can increase the vibration and noise levels of the system, leading to serious damage and economic losses [15–17]. Therefore, the development of fault diagnostic technologies is of vital importance in ensuring the safe operation of rotating machines and reducing maintenance costs [18]. In recent years, condition monitoring and fault diagnosis of gearboxes have been an intensive research topic.

1.1. Related works

Fault detection in rotating machinery is usually based on the analysis of various types of data, such as vibration data, oil and bearing temperatures, torque, vibration, and current signals. Vibration data-based approaches are the most widely used among these methods because they carry fault traces in the signal.

Vibration measurements are usually performed to identify gearbox failures, and these measurements are performed by multiple sensors. Although broken gears cause force pulses in the vibration signal, the accurate evaluation of these signals is a complex process.

Vibration signals in a gearbox exhibit nonlinear and Gaussian characteristics due to factors such as friction, damping, nonlinear stiffness, sudden peaks occurring in localized fault regions and load variations between gears. Such faults spread the energy over different frequencies, making it difficult to analyze the signal. Furthermore, because each gearbox produces a unique vibration signal, there is a high risk that methods or settings that are suitable for one system may fail in another system. To overcome these challenges, researchers have developed various methods that combine vibration signals, acoustic data, and signal processing techniques with machine learning algorithms. These approaches have resulted in remarkable achievements in fault detection processes. A summary of the studies in this field is given in Table 1.

Download:

Table 1. Summary of related works.

https://doi.org/10.1371/journal.pone.0350838.t001

1.2. Motivation and contributions

In previous studies, vibration data were trained using text-based classical machine learning and deep learning models to classify healthy and broken vibrations. Therefore, fault diagnostic performance could not produce state-of-the-art results. We believe that it may be possible to obtain state-of-the-art results in problems where only two classes are determined. Therefore, a method can be developed from a different perspective. The proposed method converts text data into image data and uses pre-trained deep learning models for segmentation of images. The following main contributions have been achieved with this study;

The maximum, average, and minimum values of the vibration data of different sensors are used.
Matching the Red, Green and Blue channels of the image with the normalized maximum, average, and minimum vibration data in the range [0–255].
Creating 500 pieces of 256x256 images for both classes from the matched data.
Segmenting the images by training the pre-trained U-Net deep learning segmentation model for the first time.

2. Material and method

2.1. Dataset

With the help of SpectraQuest’s Gearbox Fault Diagnostics Simulator, broken and healthy gear data were collected at 10 different loads [0, 10, 20, 30, 40, 50, 60, 70, 80, 90] with four different vibration sensors (a1, a2, a3, a4). Healthy gear data file name starts with ‘h’ (e.g., h30hz0.csv). Broken gear data file name starts with ‘b’ (e.g., b30hz0.csv). Table 2 shows the amount of vibration data obtained from the sensors.

Download:

Table 2. Gearbox gear vibration data.

https://doi.org/10.1371/journal.pone.0350838.t002

Healthy refers to the normal operating state of the gearbox under different loads. Broken means that the gearbox performance at different loads is degraded due to broken tooth failure.

The data given in Table 2 were measured at different loads in the range [0–90] %. Fig 1 shows the data amount graph for different loads. The amount of data for different loads ranges from 90,000–115,000. In Fig 1, the horizontal axis represents the Load Rate increasing from 0 to 90, and the vertical axis represents the number of Broken and Healthy data points at these loads.

Download:

Fig 1. Number of healthy and defective gear data at different loads.

https://doi.org/10.1371/journal.pone.0350838.g001

2.2. The Proposed method

In general, our proposed method can be expressed as converting non-visual sensor vibration data into RGB images and segmenting them by training with image-based deep learning models. Fig 2 shows the general framework of the proposed method.

Download:

Fig 2. Proposed method framework.

https://doi.org/10.1371/journal.pone.0350838.g002

Although the main problem is formulated as binary classification (healthy vs. faulty), the dataset inherently reflects a more complex scenario due to multiple load conditions, multi-sensor inputs, and large-scale variability. Therefore, binary classification was used as a controlled baseline for evaluating generalization, sensor fusion, and robustness under varying operating conditions. Instead of direct classification, a segmentation-based framework—where 1D vibration signals are converted into RGB images and learned through block-based spatial encoding—was employed as a representation learning strategy. This approach allows the model to capture local patterns, inter-sensor relationships, and distributional variations more effectively. Furthermore, the proposed methodology can be naturally extended to multi-failure scenarios by adapting the segmentation masks to multi-class structures. Compared to traditional 1D and time–frequency methods, the image-based representation provides a computationally efficient and structurally robust alternative for vibration-based fault diagnosis.

When Fig 2 is analyzed, 3 different algorithms were applied for our method. The first algorithm is the process for the transformation of vibration data into images (Algorithm 1). The pseudo-code of Algorithm 1 is given below.

Algorithm 1: Conversion of the sensor vibration data into R, G, and B datas

Input: a1, a2, a3, and a4 sensor vibration data

Output: Red, Green, Blue, Labels data

1: a1[], a2[], a3[], a[] = 0, sensor_data = LoadFile(“merged_csv_file”)

2: Red[], Green[], Blue[] = 0, Labels[]

3: MaxR, MeanG, MinB = 0

4: for i in length(sensor_data):

a1[i] = sensor_data[i][1], a2[i] = sensor_data[i][2]

a3[i] = sensor_data[i][3], a[i] = sensor_data[i][4]

MaxR = max (a1[i], a2[i], a3[i], a4[i])

MeanG = mean (a1[i], a2[i], a3[i], a4[i])

MaxB = min (a1[i], a2[i], a3[i], a4[i])

Labels[i] = sensor_data[i][5]

Red[i] = Normalize (MaxR, [0–255])

Green[i] = Normalize (MeanG, [0–255])

Blue[i] = Normalize(MinB, [0–255])

5: end for

6: return Red[], Green[], Blue[], Labels[]

Algorithm 2 describes the creation of 500 images from the red, green, and blue data obtained. Algorithinvolvesbout training and segmenting the generated RGB images with the U-Net deep learning model. The data whose Max, Mean, and Min values are calculated are normalized to the range [0–255] since each channel is 8-bit before being equalized to the Red, Green, and Blue channels. While creating the image sizes, since the input of the U-Net deep learning model was 256x256, the images were created in 256x256 size. 80% of the generated images were used for training and 20% for testing. After the model training, segmented images in 4x4 dimensions were obtained.

Algorithm 2 was used to create labeled images with Red, Green, Blue and Labels data obtained according to Algorithm 1. The pseudo-code of Algorithm 2 is given below.

Algorithm 2: Conversion of Red, Green, Blue data to image

Input: Red[], Green[], Blue[], Labels[]

Output: 500 images with labels

1: Create folder “images” and “response”

2: image_size = 256, num_images = 500, num_blocks = 4

3: block_size = image_size / num_blocks

4: for i in num_images:

4.1: red_channel[], green_channel[], blue_channel[] = 0, label_rgb[] = 0

4.2: for j in num_blocks x num_blocks:

calculate block start and end coordinates

selected_class = Choose a random class (0 or 1)

find indexes of this class

4.2.1: If there is enough data of this class then

bs = randomly select block_size² of data (Red[], Green[], Blue[], Labels[])

red_channel[start..bs] = Red[start..bs]

green_channel[start..bs] = Green[start..block_bs]

blue_channel[start..bs] = Blue[start..bs]

label_rgb[start..bs] = selected_class*255

End If

4.2.2: image[j] = merge (red_channel[], green_channel[], blue_channel[])

response[j] = merge (label_rgb[])

4.3: end for

image_i = merge (image[j])

response_i = merge (response[j])

4.4: save_image(image_i), save_image(response_i)

5: end for

6: return images, labels

Sample images generated with Algorithm 2 are given in Fig 3. As shown in Fig 3, the generated images are Ground Truth images representing block-based class information. The black and white blocks represent the Broken and Healthy labels, respectively. The images and labels (classes) obtained according to Algorithm 2 were trained with the U-Net deep learning model according to Algorithm 3.

Download:

Fig 3. Generated images (a) Image (b) Ground Truth (Labels).

https://doi.org/10.1371/journal.pone.0350838.g003

Algorithm 3: Training of images with U-Net

Input: images, responses

Output: metrics, labels

1: classNames = ["0", "1"]

img_train = split (images, 1:400), img_test = split (images, 401:500)

labels_train = split (responses, 1:400), labels_test = split (responses, 401:500)

2: training data = merge(img_train, labels_train)

3: numClasses = numel(classNames), imageSize = [256 256 3]

4: lgraph = unetLayers(imageSize, numClasses)

5: options = trainingOptions(’adam,’ ‘InitialLearnRate,’ 1e-4, ‘MaxEpochs,’ 10, ‘MiniBatchSize,’ 16, ‘Plots,’ ‘training-progress,’ ‘ValidationData,’ pixelLabelImageDatastore(img_test, labels_test), ‘ValidationFrequency,’ 10)

6: net = trainNetwork(trainingData, lgraph, options)

7: labels = semanticseg (img_test, net, ‘MiniBatchSize’, 16)

metrics = evaluateSemanticSegmentation(labels, labels_test)

7: return metrics, labels

2.3. Full workflow description

Fig 4 summarizes the general methodology we propose. As shown in Fig 4, the proposed method converts non-visual vibration sensor signals into RGB images and performs classification through semantic segmentation using a U-Net–based deep learning architecture. The complete workflow consists of four main stages: (i) data acquisition and preprocessing, (ii) feature extraction and RGB encoding, (iii) synthetic image generation, and (iv) deep learning–based segmentation and evaluation using performance metrics.

Download:

Fig 4. Workflow overview of the proposed method.

https://doi.org/10.1371/journal.pone.0350838.g004

Combining raw data and preprocessing: The raw data consist of vibration signals collected from four sensors, denoted as and . The combined CSV file includes four synchronized vibration measurements along with class labels indicating fault conditions. After loading the combined dataset, the following preprocessing steps are applied, including parsing the sensor measurement data and label information (Algorithm 1):

Load the combined vibration dataset
Parse sensor data and labels
Extract statistical features (max, mean, min)
Normalize values to the RGB intensity range [0, 255]

No signal filtering is applied prior to feature extraction. The preprocessing pipeline can be summarized as follows:

Sensor vibration data → Feature extraction → Normalization → RGB channel mapping

For each index i (i.e., for each row), the four sensor values are used to calculate statistical features representing the current vibration state. Accordingly, the set can be defined as:

The computation of the max, mean, and min statistical values is given in Equations (1)–(3).

(1)

(2)

(3)

The obtained , , and values represent the vibration distribution at each time interval. These values are subsequently used for normalization and RGB encoding. An image consists of three channels R, G, and B each with an 8-bit depth. Therefore, the , , and values are normalized as defined in Equation (4).

(4)

Here, I represents the image pixel, and x denotes the max, mean, and min values.

The normalized max, mean, and min values are then mapped to the R, G, and B channels, respectively. This mapping yields pixel values for the i-th time interval. These pixels are used to construct images of size 256 × 256.

Each image is divided into 4 × 4 = 16 blocks, where each block contains 256/4 = 64 pixels along both the horizontal and vertical dimensions. Therefore, each image block has a size of 64 × 64 pixels.

In Algorithm 2, during block construction, each block is filled using vibration samples belonging to a selected class. The images created from these blocks are generated as follows:

Select a random class label (0 or 1)
Retrieve vibration samples corresponding to the selected class
Randomly select a sufficient number of RGB samples
Fill the corresponding block region with these RGB values
Assign the block class value in the segmentation mask

The segmentation mask classes are defined as:

0 → class 0 (broken)
255 → class 1 (healthy)

In this manner, each block corresponds to a uniform class region in the ground-truth mask.

Each vibration sample initially has a single class label. When samples are placed into image blocks:

All pixels generated from these samples inherit the same class label
The class label is assigned to the corresponding region in the segmentation mask

Thus, a fixed mask is generated on a block-wise basis. If a block corresponds to class 1, the mask pixel value is set to 255; for class 0, it is set to 0. In this way, vibration-based class labels are transformed into pixel-level segmentation masks.

Rationale for Choosing Segmentation Instead of Direct Classification: Although the problem involves two classes, a semantic segmentation framework is preferred over direct classification for the following reasons:

It enables learning of spatial distributions within synthetic images.
It improves robustness against noise in vibration patterns.
The U-Net architecture provides strong feature extraction through its encoder–decoder structure.
Block-based labeling allows the network to learn local vibration patterns rather than relying on a single global decision.

In total, 500 RGB images are generated. Of these, 80% (400 images) are used for training, and 20% (100 images) are used for testing. Since both image blocks and entire images are randomly generated during the image synthesis process, additional shuffling of the dataset is not required. The training images are used to train the U-Net model, while the test images are used for validation and evaluation.

The configuration of the U-Net network is as follows:

Input size: 256 × 256 × 3
Number of classes: 2
Optimization algorithm: Adam
Learning rate: 1 × 10 ⁻ ⁴
Number of epochs: 10
Mini-batch size: 16

Performance is evaluated using semantic segmentation metrics.

The overall workflow can be summarized as follows:

Load the combined vibration dataset
Separate sensor channels and labels
Compute maximum, mean, and minimum features
Normalize features to the [0–255] range
Map features to RGB channels
Generate 500 synthetic 256 × 256 RGB images (total of 16 blocks per image)
Create corresponding segmentation masks using block-based labeling
Split the dataset into training and test sets
Train the U-Net segmentation model
Perform segmentation on the test images
Evaluate performance using segmentation metrics

2.4. U-Net image segmentation deep learning model

U-Net is a convolutional neural network model that is especially used in medical image segmentation [47–50]. Segmentation of a 512 × 512 image can be performed on a graphics processing unit (GPU) in less than one second. The U-Net architecture has also been used in diffusion models for noise removal in images [51]. The basic idea of the U-Net architecture aims to augment a standard shrinking network with successive layers by replacing pooling operations with up-sampling operators. These layers increase the output resolution. Subsequent convolutional layers learn to use this high-resolution information to generate a precise output. A notable innovation of U-Net is the inclusion of several feature channels in the upsampling part. These channels allow the network to transfer context information to higher-resolution layers. Therefore, the expanding path is almost symmetrical with the contracting path, resulting in a u-shaped architecture [47,48,52–54].

The U-network has a u-shaped architecture with a structure of contracting and expanding paths. The collapsing path is a classical convolutional network consisting of iterative convolution operations, each of which is followed by a corrected linear unit (ReLU) followed by a maximum pooling operation. By combining high-resolution features from the contracting path through a series of up-convolution and merging operations, the expanding path recovers both feature and spatial information [55]. Fig 5 shows the U-Net architecture. As shown in Fig 5, the generated image and block labels are fed into a 6-level U-Net model to predict the classes of blocks on the image. Blue arrows represent 3D (3x3) Convolution and ReLU at the horizontal level, red arrows represent 2x2 MaxPooling between levels, green arrows represent 2x2 Pooling, and pink arrow represents 1x1 Convolution.

Download:

Fig 5. U-Net architecture.

https://doi.org/10.1371/journal.pone.0350838.g005

3. Experimental results

For the implementation of the proposed method, a computer with Intel i7 8th generation processor and 32-GB memory was used. The application was performed on this computer using MATLAB software. Vibration data graphs for 4 different sensor types at different loads are given in Fig 6. When the graphs for different loads are analyzed, it is seen that the broken gear and healthy gear plots are separated. Therefore, it is seen that high performance results can be obtained. In Fig 6, the horizontal axis shows the 10-second advancement in seconds, and the vertical axis shows the vibration change in m/s². As can be seen, the vibration change plots are clearly separated for different loads.

Download:

Fig 6. Graphs of broken and healthy gear sensor data at different loads (0%−90%) (a) Broken (b) Heathy.

https://doi.org/10.1371/journal.pone.0350838.g006

RGB histogram graphs of the images obtained according to Algorithm 2 are shown in Fig 7. When the plots are analyzed, it can be seen that RGB images for different loads can be segmented with the U-Net model with high performance. In Fig 7, the horizontal axis shows the pixel value variation in the Red, Green and Blue channels in the range [0.255], and the vertical axis shows how many of these pixel values there are for each channel.

Download:

Fig 7. RGB image histogram plots obtained for different loads (a) Broken (b) Healhty.

https://doi.org/10.1371/journal.pone.0350838.g007

Fig 8 shows the confusion matrix results calculated for different loads. The results show that the distribution is generally excellent for all loads. Especially from 30% to 60% and 90% loads the results are more pronounced.

Download:

Fig 8. Confusion matrix results for all loads.

https://doi.org/10.1371/journal.pone.0350838.g008

Table 3 shows the results of class-based performance for all loads. A healthy gearbox fault diagnosis accuracy rate of 99.99% is obtained for all loads, as shown in Table 3. For the broken gearbox, 99.53% accuracy was obtained at 60% load.

Download:

Table 3. Class-based performance results.

https://doi.org/10.1371/journal.pone.0350838.t003

The general performance results for 2 classes are given in Table 4. When the results are examined, it is seen that they are obtained at 30%, 50%, 60%, and 90% loads, as shown in the confusion matrices in Fig 8. The best result (99.76%) was obtained at 60% load.

Download:

Table 4. General performance results for different load rates.

https://doi.org/10.1371/journal.pone.0350838.t004

The validation accuracy values are presented in Table 5. The obtained validation accuracy results show that the proposed method is consistent.

Download:

Table 5. Validation accuracy results.

https://doi.org/10.1371/journal.pone.0350838.t005

Fig 9 shows the accuracy bar graphs for all loads. In Fig 9, the horizontal axis shows different load ratios, and the vertical axis shows the Accuracy value as a percentage for different loads.

Download:

Fig 9. Accuracy bar charts for all loads.

https://doi.org/10.1371/journal.pone.0350838.g009

Fig 10 shows the results obtained with U-Net on sample images and labels.

Download:

Fig 10. Results obtained as a result of the U-Net implementation (a) Image (b) Ground Truth (c) U-Net (d) Performance Metrics (%).

https://doi.org/10.1371/journal.pone.0350838.g010

4. Discussion

For each load, 500 images were created, and the overall results were obtained by applying U-Net to 5000 images. Figs 11 and 12 show the accuracy and loss graphs for the training and validation data for 10 epochs. In Fig 11, the horizontal axis shows the change in iteration for 10 epochs, and the vertical axis shows the change in Accuracy relative to the change in iteration, as a percentage. Similarly, in Fig 12, the horizontal axis shows the iteration change for 10 epochs, and the vertical axis shows the Loss change relative to the iteration change, as a percentage. As can be seen, the changes in Accuracy and Loss have stabilized since Epoch 6. Therefore, 10 epochs are sufficient for the training process.

Download:

Fig 11. Accuracy graphs (5000 images, 10 epochs).

https://doi.org/10.1371/journal.pone.0350838.g011

Download:

Fig 12. Loss plots (5000 images, 10 epochs).

https://doi.org/10.1371/journal.pone.0350838.g012

Fig 13 shows the confusion matrix obtained as a result of training.

Download:

Fig 13. Confusion matrix for 5000 images training.

https://doi.org/10.1371/journal.pone.0350838.g013

The accuracy, precision, recall, F1-score, and IoU (mAP) values, which are the performance metrics given in Table 6, were calculated according to Equations (5), (6), (7), (8), and (9), respectively. In other words, the area overlapping the common area must overlap one-to-one. This is a state-of-the-art approach that pushes the limits. Accuracy (Equation 5) represents the percentage of correct predictions made by the model; Precision (Equation 6) indicates the proportion of samples predicted as broken that are actually faulty; Recall (Equation 7) measures the proportion of truly faulty samples that are correctly identified by the model; F1-score (Equation 8) summarizes the overall performance as the harmonic mean of Precision and Recall; and IoU (mAP) (Equation 9) represents the average overlap ratio between the predicted and ground truth segmentation masks across all images.

Download:

Table 6. Comparison with other methods for gearbox diagnostics.

https://doi.org/10.1371/journal.pone.0350838.t006

(5)

(6)

(7)

(8)

(9)

Fig 14 shows the segmentation sample images obtained from U-Net for 5000 images.

Download:

Fig 14. Total of 5000 images and U-Net overlapping images (a) Image (b) Ground Truth (c) U-Net.

https://doi.org/10.1371/journal.pone.0350838.g014

Table 6 shows that recent studies have focused on deep learning methods. In NA Raji et al. [10], it is seen that similar machine learning methods do not exceed 87%−88% accuracy values. In Ahmed, I. et al. [56], a 91% accuracy value was obtained using a deep learning-based autoencoder method. In Sohaib Arshad Mayo et al. obtained a 98.68% accuracy result close to the state-of-the-art using the pre-trained Keras Sequential API deep learning model. The proposed method yielded state-of-the-art results with 99.87% accuracy.

The segmentation performed in this study does not aim to localize faults on the physical components of the gear system. Instead, it represents patterns in an artificial image space constructed from statistical features extracted from vibration data. Therefore, the resulting segmentation output highlights feature regions associated with faults, rather than indicating physical locations.

The proposed method does not aim at physical fault localization but rather at global classification through a spatially structured representation. In this study, segmentation is used as a representation learning tool to capture inter-sensor relationships and local patterns. However, the outputs represent patterns in the feature space rather than physical locations. Furthermore, the computational cost of the model requires optimization for real-time applications.

The high accuracy and IoU values obtained demonstrate that the proposed method is highly successful in distinguishing between healthy and faulty states; however, it should be noted that this performance is also influenced by the binary nature of the problem. The main contribution of the proposed approach is the integration of multi-sensor data through an image-based representation of vibration signals, enabling the learning of local patterns. In this context, segmentation does not aim at physical localization but is instead used as a representation mechanism to capture patterns in the feature space. Compared to existing methods in the literature, the proposed approach offers not only high performance but also an alternative data representation strategy. However, the computational cost of the model requires optimization for real-time applications.

5. Conclusion

While gearbox fault diagnosis is performed with approximately 90% accuracy using classical machine learning methods, deep learning methods have been used with high accuracy. In this study, an image-based convolutional approach is adopted instead of the classical deep learning approaches previously used with numerical data. For this purpose, these data were converted into images, and a state-of-the-art result (99.87%) was obtained using the pre-trained U-Net deep learning model for image segmentation. In addition, the mAP value for IoU = 99.74 gave a state-of-the-art result of 99.74%. This proved the effectiveness of the proposed approach.

Rather than directly proposing a new classification algorithm, this study presents an image-based representation of vibration data and a segmentation-based learning approach built upon this representation. In this respect, the contribution should be regarded not only as a performance improvement but also as a methodological alternative for data representation. In future work, we plan to apply our proposed approach to similar diagnostic problems. A portable application using this approach can be developed in the industry, and an error-free diagnosis can be made.

5.1. Limitations

While the proposed method demonstrates high accuracy and state-of-the-art performance, several limitations should be considered. First, the study is formulated as a binary classification problem (healthy vs. faulty). Although this design enables controlled evaluation under varying load conditions, it does not fully capture the complexity of real-world transmission diagnostics, where multiple failure types (e.g., wear, misalignment, and lubrication defects) may coexist. Therefore, the proposed method should be regarded as a preliminary yet efficient validation framework rather than a comprehensive fault diagnosis system. Second, the proposed approach relies on a synthetic image generation methodology based on statistical features (maximum, mean, and minimum). While this transformation facilitates the use of image-based deep learning models, it may result in the loss of important information related to temporal dependencies and frequency-domain characteristics, which are often critical in vibration analysis.

Third, segmentation masks are generated using a block-based labeling strategy, in which a single class is assigned to each block. Although this simplifies the learning process, it does not accurately reflect pixel-level fault localization observed in real-world scenarios. Consequently, the segmentation task is artificially constructed, which may limit its interpretability and practical applicability.

Fourth, the study does not include a direct comparison with established 1D approaches, such as 1D CNNs, LSTM networks, or time–frequency representation (TFR)-based methods [58]. Since these techniques are widely used in vibration-based fault diagnosis and often achieve high accuracy, their absence limits the ability to fully position the proposed method within the existing literature. Finally, although the dataset includes multiple load conditions, all data were collected using a controlled experimental setup (SpectraQuest Transmission Simulator). This may restrict the generalizability of the results to real industrial environments, where noise, sensor placement variability, and operational uncertainties are more prominent.

5.2. Future work

Future studies will address these limitations and extend the proposed framework in several directions. First, the method will be evaluated in multi-class and multi-failure diagnostic scenarios. By extending the segmentation masks to include multiple fault categories, the proposed methodology can better represent realistic industrial conditions in which different fault types may occur simultaneously. Second, more advanced signal representations will be explored. In particular, integrating time–frequency representations (e.g., spectrograms and wavelet transforms), or combining them with the proposed RGB encoding, may improve feature preservation and enhance model performance. Third, the image generation process will be refined to reduce information loss and better capture temporal dynamics. This may involve sliding window techniques, adaptive feature extraction methods, or hybrid representations that combine statistical and ordinal features.

Fourth, a comprehensive benchmarking study will be conducted against advanced 1D and 2D deep learning models, including 1D CNNs, LSTMs, and TFR-based CNN architectures. This will provide a clearer understanding of the strengths and limitations of the proposed approach. Fifth, the proposed method will be validated using real-world industrial datasets to evaluate its robustness and generalization capability under practical conditions. Finally, future work will focus on developing lightweight and real-time implementations of the model for deployment in embedded or edge-based condition monitoring systems.

References

1. Kumar S, Kumar V, Sarangi S, Singh OP. Gearbox fault diagnosis: A higher order moments approach. Measurement. 2023;210:112489.
- View Article
- Google Scholar
2. Wang S, Tian J, Liang P, Xu X, Yu Z, Liu S, et al. Single and simultaneous fault diagnosis of gearbox via wavelet transform and improved deep residual network under imbalanced data. Engineering Applications of Artificial Intelligence. 2024;133:108146.
- View Article
- Google Scholar
3. Kumar V, Mukherjee S, Verma AK, Sarangi S. An AI-Based Nonparametric Filter Approach for Gearbox Fault Diagnosis. IEEE Trans Instrum Meas. 2022;71:1–11.
- View Article
- Google Scholar
4. Parey A, Singh A. Gearbox fault diagnosis using acoustic signals, continuous wavelet transform and adaptive neuro-fuzzy inference system. Applied Acoustics. 2019;147:133–40.
- View Article
- Google Scholar
5. Cerrada M, Zurita G, Cabrera D, Sánchez R-V, Artés M, Li C. Fault diagnosis in spur gears based on genetic algorithm and random forest. Mechanical Systems and Signal Processing. 2016;70–71:87–103.
- View Article
- Google Scholar
6. Li C, Sanchez R-V, Zurita G, Cerrada M, Cabrera D, Vásquez RE. Gearbox fault diagnosis based on deep random forest fusion of acoustic and vibratory signals. Mechanical Systems and Signal Processing. 2016;76–77:283–93.
- View Article
- Google Scholar
7. Kumar V, Rai A, Mukherjee S, Sarangi S. A Lagrangian approach for the electromechanical model of single-stage spur gear with tooth root cracks. Eng Fail Anal. 2021;129:105662.
- View Article
- Google Scholar
8. Zhang Y, Lu W, Chu F. Planet gear fault localization for wind turbine gearbox using acoustic emission signals. Renewable Energy. 2017;109:449–60.
- View Article
- Google Scholar
9. Wang Z, Yang J, Guo Y. Unknown fault feature extraction of rolling bearings under variable speed conditions based on statistical complexity measures. Mechanical Systems and Signal Processing. 2022;172:108964.
- View Article
- Google Scholar
10. A. Raji N, O. Kuku R, O. Bakare A, M. Ogunbiyi M, I. Morafa T. Comparative analysis of gearbox fault detection using ensemble learning techniques with vibration sensor data. J Prod Eng. 2024;27(2):1–9.
- View Article
- Google Scholar
11. Hassan Al-Atat H x, Siegel D, Lee J. A Systematic Methodology for Gearbox Health Assessment and Fault Classification. IJPHM. 2011;2(1).
- View Article
- Google Scholar
12. Mohammed OD, Rantatalo M, Aidanpää J-O. Dynamic modelling of a one-stage spur gear system and vibration-based tooth crack detection analysis. Mechanical Systems and Signal Processing. 2015;54–55:293–305.
- View Article
- Google Scholar
13. Liang X, Zuo MJ, Feng Z. Dynamic modeling of gearbox faults: A review. Mechanical Systems and Signal Processing. 2018;98:852–76.
- View Article
- Google Scholar
14. Liang P, Deng C, Wu J, Yang Z, Zhu J, Zhang Z. Compound Fault Diagnosis of Gearboxes via Multi-label Convolutional Neural Network and Wavelet Transform. Computers in Industry. 2019;113:103132.
- View Article
- Google Scholar
15. Pacheco F, Valente de Oliveira J, Sánchez R-V, Cerrada M, Cabrera D, Li C, et al. A statistical comparison of neuroclassifiers and feature selection methods for gearbox fault diagnosis under realistic conditions. Neurocomputing. 2016;194:192–206.
- View Article
- Google Scholar
16. Wang Y, Yang S, Sanchez RV. Gearbox Fault Diagnosis Based on a Novel Hybrid Feature Reduction Method. IEEE Access. 2018;6:75813–23.
- View Article
- Google Scholar
17. Wang Z, Wang J, Wang Y. An intelligent diagnosis scheme based on generative adversarial learning deep neural networks and its application to planetary gearbox fault pattern recognition. Neurocomputing. 2018;310:213–22.
- View Article
- Google Scholar
18. Zhao M, Kang M, Tang B, Pecht M. Deep Residual Networks With Dynamically Weighted Wavelet Coefficients for Fault Diagnosis of Planetary Gearboxes. IEEE Trans Ind Electron. 2018;65(5):4290–300.
- View Article
- Google Scholar
19. Krishna Durbhaka G, Selvaraj B, Mittal M, Saba T, Rehman A, Mohan Goyal L. Swarm-LSTM: Condition Monitoring of Gearbox Fault Diagnosis Based on Hybrid LSTM Deep Neural Network Optimized by Swarm Intelligence Algorithms. Computers, Materials & Continua. 2021;66(2):2041–59.
- View Article
- Google Scholar
20. Vrba J, Cejnek M, Steinbach J, Krbcova Z. A Machine Learning Approach for Gearbox System Fault Diagnosis. Entropy (Basel). 2021;23(9):1130. pmid:34573755
- View Article
- PubMed/NCBI
- Google Scholar
21. Yu J, Zhou X, Lu L, Zhao Z. Multiscale Dynamic Fusion Global Sparse Network for Gearbox Fault Diagnosis. IEEE Trans Instrum Meas. 2021;70:1–11.
- View Article
- Google Scholar
22. Yao Y, Zhang S, Yang S, Gui G. Learning attention representation with a multi-scale cnn for gear fault diagnosis under different working conditions. Sensors. 2020;20(4).
- View Article
- Google Scholar
23. Ye Z, Yu J. AKRNet: A novel convolutional neural network with attentive kernel residual learning for feature learning of gearbox vibration signals. Neurocomputing. 2021;447:23–37.
- View Article
- Google Scholar
24. Chen S-N, Liu F, Gao C-X, Li J. Gearbox Fault Diagnosis Classification with Empirical Mode Decomposition Based on Improved Long Short-Term Memory. In: 2021 IEEE 6th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA), 2021. 568–75.
- View Article
- Google Scholar
25. Zhao M, Kang M, Tang B, Pecht M. Deep Residual Networks With Dynamically Weighted Wavelet Coefficients for Fault Diagnosis of Planetary Gearboxes. IEEE Trans Ind Electron. 2018;65(5):4290–300.
- View Article
- Google Scholar
26. Liu X, Zhou Q, Zhao J, Shen H, Xiong X. Fault diagnosis of rotating machinery under noisy environment conditions based on a 1-D convolutional autoencoder and 1-D convolutional neural network. Sensors. 2019;19(4).
- View Article
- Google Scholar
27. Wang H, Xu J, Sun C, Yan R, Chen X. Intelligent Fault Diagnosis for Planetary Gearbox Using Time-Frequency Representation and Deep Reinforcement Learning. IEEE/ASME Trans Mechatron. 2022;27(2):985–98.
- View Article
- Google Scholar
28. He J, Yang S, Gan C. Unsupervised fault diagnosis of a gear transmission chain using a deep belief network. Sensors. 2017;17(7):1–21.
- View Article
- Google Scholar
29. Li X, Li J, Qu Y, He D. Semi-supervised gear fault diagnosis using raw vibration signal based on deep learning. Chinese Journal of Aeronautics. 2020;33(2):418–26.
- View Article
- Google Scholar
30. Saufi SR, Ahmad ZAB, Leong MS, Lim MH. Gearbox Fault Diagnosis Using a Deep Learning Model With Limited Data Sample. IEEE Trans Ind Inf. 2020;16(10):6263–71.
- View Article
- Google Scholar
31. C S, Sun G, Wang Y. Intelligent detection of a planetary gearbox composite fault based on adaptive separation and deep learning. Sensors. 2019.
- View Article
- Google Scholar
32. Zadeh MH, Kia SH, Nourani M, Henao H, Capolino G-A. Gear fault diagnosis using discrete wavelet transform and deep neural networks. In: IECON 2016 - 42nd Annual Conference of the IEEE Industrial Electronics Society, 2016. 1494–500.
- View Article
- Google Scholar
33. Chen ZQ, Li C, Sanchez RV. Gearbox fault identification and classification with convolutional neural networks. Shock and Vibration. 2015;2015.
- View Article
- Google Scholar
34. Li Y, Cheng G, Pang Y, Kuai M. Planetary gear fault diagnosis via feature image extraction based on multi central frequencies and vibration signal frequency spectrum. Sensors. 2018;18(6).
- View Article
- Google Scholar
35. L SR, Shih HH, i-C NY, L HH. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. 1998.
36. Ha JM, Youn BD. A Health Data Map-Based Ensemble of Deep Domain Adaptation Under Inhomogeneous Operating Conditions for Fault Diagnosis of a Planetary Gearbox. IEEE Access. 2021;9:79118–27.
- View Article
- Google Scholar
37. Shi J, Peng D, Peng Z, Zhang Z, Goebel K, Wu D. Planetary gearbox fault diagnosis using bidirectional-convolutional LSTM networks. Mech Syst Signal Process. 2022;162(August 2020):107996.
- View Article
- Google Scholar
38. Ye Z, Yu J. Deep morphological convolutional network for feature learning of vibration signals and its applications to gearbox fault diagnosis. Mechanical Systems and Signal Processing. 2021;161:107984.
- View Article
- Google Scholar
39. Zhang K, Tang B, Deng L, Liu X. A hybrid attention improved ResNet based fault diagnosis method of wind turbines gearbox. Measurement (Lond). 2021;179(November 2020):109491.
- View Article
- Google Scholar
40. Chen R, Huang X, Yang L, Xu X, Zhang X, Zhang Y. Intelligent fault diagnosis method of planetary gearboxes based on convolution neural network and discrete wavelet transform. Computers in Industry. 2019;106:48–59.
- View Article
- Google Scholar
41. Yang L, Chen H. Fault diagnosis of gearbox based on RBF-PF and particle swarm optimization wavelet neural network. Neural Comput & Applic. 2018;31(9):4463–78.
- View Article
- Google Scholar
42. Azamfar M, Singh J, Bravo-Imaz I, Lee J. Multisensor data fusion for gearbox fault diagnosis using 2-D convolutional neural network and motor current signature analysis. Mech Syst Signal Process. 2020;144:106861.
- View Article
- Google Scholar
43. Feng Z, Gao A, Li K, Ma H. Planetary gearbox fault diagnosis via rotary encoder signal analysis. Mechanical Systems and Signal Processing. 2021;149:107325.
- View Article
- Google Scholar
44. Yao G, Wang Y, Benbouzid M, Ait-ahmed M. A Hybrid Gearbox Fault Diagnosis Method Based on GWO-VMD and DE-KELM Gang. Applied Sciences. 2021.
- View Article
- Google Scholar
45. Zhang W, Peng G, Li C, Chen Y, Zhang Z. A new deep learning model for fault diagnosis with good anti-noise and domain adaptation ability on raw vibration signals. Sensors. 2017;17(2).
- View Article
- Google Scholar
46. Zhang Y, Ding J, Li Y, Ren Z, Feng K. Multi-modal data cross-domain fusion network for gearbox fault diagnosis under variable operating conditions. Eng Appl Artif Intell. 2024;133(PC):108236.
- View Article
- Google Scholar
47. Ronneberger O, Fischer PF, Brox T. U-net: Convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III, 2015. 234–41.
- View Article
- Google Scholar
48. Shelhamer E, Long J, Darrell T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans Pattern Anal Mach Intell. 2017;39(4):640–51. pmid:27244717
- View Article
- PubMed/NCBI
- Google Scholar
49. Neha F, Bhati D, Shukla DK, Dalvi SM, Mantzou N, Shubbar S. An analytics-driven review of U-Net for medical image segmentation. Healthcare Analytics. 2025;8:100416.
- View Article
- Google Scholar
50. Jiangtao W, Ruhaiyem NIR, Panpan F. A comprehensive review of U-Net and its variants: advances and applications in medical image segmentation. IET Image Process. 2025;19(1):e70019.
- View Article
- Google Scholar
51. Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models. In: Advances in Neural Information Processing Systems, 2020.
- View Article
- Google Scholar
52. Sariturk B, Seker DZ. A Residual-Inception U-Net (RIU-Net) Approach and Comparisons with U-Shaped CNN and Transformer Models for Building Segmentation from High-Resolution Satellite Images. Sensors. 2022;22(19).
- View Article
- Google Scholar
53. Zhang P, Chen C. Time–Frequency Analysis for Planetary Gearbox Fault Diagnosis Based on Improved U-Net++. J Fail Anal and Preven. 2023;23(3):1068–80.
- View Article
- Google Scholar
54. Beeche C. Super U-Net: A modularized generalizable architecture. Pattern Recognit. 2022;128:108669.
- View Article
- Google Scholar
55. U-Net. https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/ Accessed 2025 January 14.
56. Ahmed I, Ahmad M, Chehri A, Jeon G. A Smart-Anomaly-Detection System for Industrial Machines Based on Feature Autoencoder and Deep Learning. Micromachines (Basel). 2023;14(1):154. pmid:36677215
- View Article
- PubMed/NCBI
- Google Scholar
57. Arshad Mayo S, Rehman S, Cai Z. High-accuracy gearbox fault detection using deep learning on vibrational data. J Phys Conf Ser. 2024;2853(1):12066.
- View Article
- Google Scholar
58. Karaduman G, Kiliç İ, Tasar B, Yaman O. GearDetectionNET: Detection of Gearbox Faults Under Different Load Conditions via 1D-CNN Architecture. J Vib Eng Technol. 2025;13(7).
- View Article
- Google Scholar

[ref1] 1. Kumar S, Kumar V, Sarangi S, Singh OP. Gearbox fault diagnosis: A higher order moments approach. Measurement. 2023;210:112489.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Wang S, Tian J, Liang P, Xu X, Yu Z, Liu S, et al. Single and simultaneous fault diagnosis of gearbox via wavelet transform and improved deep residual network under imbalanced data. Engineering Applications of Artificial Intelligence. 2024;133:108146.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Kumar V, Mukherjee S, Verma AK, Sarangi S. An AI-Based Nonparametric Filter Approach for Gearbox Fault Diagnosis. IEEE Trans Instrum Meas. 2022;71:1–11.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Parey A, Singh A. Gearbox fault diagnosis using acoustic signals, continuous wavelet transform and adaptive neuro-fuzzy inference system. Applied Acoustics. 2019;147:133–40.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Cerrada M, Zurita G, Cabrera D, Sánchez R-V, Artés M, Li C. Fault diagnosis in spur gears based on genetic algorithm and random forest. Mechanical Systems and Signal Processing. 2016;70–71:87–103.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Li C, Sanchez R-V, Zurita G, Cerrada M, Cabrera D, Vásquez RE. Gearbox fault diagnosis based on deep random forest fusion of acoustic and vibratory signals. Mechanical Systems and Signal Processing. 2016;76–77:283–93.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref7] 7. Kumar V, Rai A, Mukherjee S, Sarangi S. A Lagrangian approach for the electromechanical model of single-stage spur gear with tooth root cracks. Eng Fail Anal. 2021;129:105662.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref8] 8. Zhang Y, Lu W, Chu F. Planet gear fault localization for wind turbine gearbox using acoustic emission signals. Renewable Energy. 2017;109:449–60.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref9] 9. Wang Z, Yang J, Guo Y. Unknown fault feature extraction of rolling bearings under variable speed conditions based on statistical complexity measures. Mechanical Systems and Signal Processing. 2022;172:108964.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref10] 10. A. Raji N, O. Kuku R, O. Bakare A, M. Ogunbiyi M, I. Morafa T. Comparative analysis of gearbox fault detection using ensemble learning techniques with vibration sensor data. J Prod Eng. 2024;27(2):1–9.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref11] 11. Hassan Al-Atat H x, Siegel D, Lee J. A Systematic Methodology for Gearbox Health Assessment and Fault Classification. IJPHM. 2011;2(1).
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref12] 12. Mohammed OD, Rantatalo M, Aidanpää J-O. Dynamic modelling of a one-stage spur gear system and vibration-based tooth crack detection analysis. Mechanical Systems and Signal Processing. 2015;54–55:293–305.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref13] 13. Liang X, Zuo MJ, Feng Z. Dynamic modeling of gearbox faults: A review. Mechanical Systems and Signal Processing. 2018;98:852–76.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref14] 14. Liang P, Deng C, Wu J, Yang Z, Zhu J, Zhang Z. Compound Fault Diagnosis of Gearboxes via Multi-label Convolutional Neural Network and Wavelet Transform. Computers in Industry. 2019;113:103132.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref15] 15. Pacheco F, Valente de Oliveira J, Sánchez R-V, Cerrada M, Cabrera D, Li C, et al. A statistical comparison of neuroclassifiers and feature selection methods for gearbox fault diagnosis under realistic conditions. Neurocomputing. 2016;194:192–206.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref16] 16. Wang Y, Yang S, Sanchez RV. Gearbox Fault Diagnosis Based on a Novel Hybrid Feature Reduction Method. IEEE Access. 2018;6:75813–23.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref17] 17. Wang Z, Wang J, Wang Y. An intelligent diagnosis scheme based on generative adversarial learning deep neural networks and its application to planetary gearbox fault pattern recognition. Neurocomputing. 2018;310:213–22.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref18] 18. Zhao M, Kang M, Tang B, Pecht M. Deep Residual Networks With Dynamically Weighted Wavelet Coefficients for Fault Diagnosis of Planetary Gearboxes. IEEE Trans Ind Electron. 2018;65(5):4290–300.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref19] 19. Krishna Durbhaka G, Selvaraj B, Mittal M, Saba T, Rehman A, Mohan Goyal L. Swarm-LSTM: Condition Monitoring of Gearbox Fault Diagnosis Based on Hybrid LSTM Deep Neural Network Optimized by Swarm Intelligence Algorithms. Computers, Materials & Continua. 2021;66(2):2041–59.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref20] 20. Vrba J, Cejnek M, Steinbach J, Krbcova Z. A Machine Learning Approach for Gearbox System Fault Diagnosis. Entropy (Basel). 2021;23(9):1130. pmid:34573755
View Article
PubMed/NCBI
Google Scholar

[59] View Article

[60] PubMed/NCBI

[61] Google Scholar

[ref21] 21. Yu J, Zhou X, Lu L, Zhao Z. Multiscale Dynamic Fusion Global Sparse Network for Gearbox Fault Diagnosis. IEEE Trans Instrum Meas. 2021;70:1–11.
View Article
Google Scholar

[63] View Article

[64] Google Scholar

[ref22] 22. Yao Y, Zhang S, Yang S, Gui G. Learning attention representation with a multi-scale cnn for gear fault diagnosis under different working conditions. Sensors. 2020;20(4).
View Article
Google Scholar

[66] View Article

[67] Google Scholar

[ref23] 23. Ye Z, Yu J. AKRNet: A novel convolutional neural network with attentive kernel residual learning for feature learning of gearbox vibration signals. Neurocomputing. 2021;447:23–37.
View Article
Google Scholar

[69] View Article

[70] Google Scholar

[ref24] 24. Chen S-N, Liu F, Gao C-X, Li J. Gearbox Fault Diagnosis Classification with Empirical Mode Decomposition Based on Improved Long Short-Term Memory. In: 2021 IEEE 6th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA), 2021. 568–75.
View Article
Google Scholar

[72] View Article

[73] Google Scholar

[ref25] 25. Zhao M, Kang M, Tang B, Pecht M. Deep Residual Networks With Dynamically Weighted Wavelet Coefficients for Fault Diagnosis of Planetary Gearboxes. IEEE Trans Ind Electron. 2018;65(5):4290–300.
View Article
Google Scholar

[75] View Article

[76] Google Scholar

[ref26] 26. Liu X, Zhou Q, Zhao J, Shen H, Xiong X. Fault diagnosis of rotating machinery under noisy environment conditions based on a 1-D convolutional autoencoder and 1-D convolutional neural network. Sensors. 2019;19(4).
View Article
Google Scholar

[78] View Article

[79] Google Scholar

[ref27] 27. Wang H, Xu J, Sun C, Yan R, Chen X. Intelligent Fault Diagnosis for Planetary Gearbox Using Time-Frequency Representation and Deep Reinforcement Learning. IEEE/ASME Trans Mechatron. 2022;27(2):985–98.
View Article
Google Scholar

[81] View Article

[82] Google Scholar

[ref28] 28. He J, Yang S, Gan C. Unsupervised fault diagnosis of a gear transmission chain using a deep belief network. Sensors. 2017;17(7):1–21.
View Article
Google Scholar

[84] View Article

[85] Google Scholar

[ref29] 29. Li X, Li J, Qu Y, He D. Semi-supervised gear fault diagnosis using raw vibration signal based on deep learning. Chinese Journal of Aeronautics. 2020;33(2):418–26.
View Article
Google Scholar

[87] View Article

[88] Google Scholar

[ref30] 30. Saufi SR, Ahmad ZAB, Leong MS, Lim MH. Gearbox Fault Diagnosis Using a Deep Learning Model With Limited Data Sample. IEEE Trans Ind Inf. 2020;16(10):6263–71.
View Article
Google Scholar

[90] View Article

[91] Google Scholar

[ref31] 31. C S, Sun G, Wang Y. Intelligent detection of a planetary gearbox composite fault based on adaptive separation and deep learning. Sensors. 2019.
View Article
Google Scholar

[93] View Article

[94] Google Scholar

[ref32] 32. Zadeh MH, Kia SH, Nourani M, Henao H, Capolino G-A. Gear fault diagnosis using discrete wavelet transform and deep neural networks. In: IECON 2016 - 42nd Annual Conference of the IEEE Industrial Electronics Society, 2016. 1494–500.
View Article
Google Scholar

[96] View Article

[97] Google Scholar

[ref33] 33. Chen ZQ, Li C, Sanchez RV. Gearbox fault identification and classification with convolutional neural networks. Shock and Vibration. 2015;2015.
View Article
Google Scholar

[99] View Article

[100] Google Scholar

[ref34] 34. Li Y, Cheng G, Pang Y, Kuai M. Planetary gear fault diagnosis via feature image extraction based on multi central frequencies and vibration signal frequency spectrum. Sensors. 2018;18(6).
View Article
Google Scholar

[102] View Article

[103] Google Scholar

[ref35] 35. L SR, Shih HH, i-C NY, L HH. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. 1998.

[ref36] 36. Ha JM, Youn BD. A Health Data Map-Based Ensemble of Deep Domain Adaptation Under Inhomogeneous Operating Conditions for Fault Diagnosis of a Planetary Gearbox. IEEE Access. 2021;9:79118–27.
View Article
Google Scholar

[106] View Article

[107] Google Scholar

[ref37] 37. Shi J, Peng D, Peng Z, Zhang Z, Goebel K, Wu D. Planetary gearbox fault diagnosis using bidirectional-convolutional LSTM networks. Mech Syst Signal Process. 2022;162(August 2020):107996.
View Article
Google Scholar

[109] View Article

[110] Google Scholar

[ref38] 38. Ye Z, Yu J. Deep morphological convolutional network for feature learning of vibration signals and its applications to gearbox fault diagnosis. Mechanical Systems and Signal Processing. 2021;161:107984.
View Article
Google Scholar

[112] View Article

[113] Google Scholar

[ref39] 39. Zhang K, Tang B, Deng L, Liu X. A hybrid attention improved ResNet based fault diagnosis method of wind turbines gearbox. Measurement (Lond). 2021;179(November 2020):109491.
View Article
Google Scholar

[115] View Article

[116] Google Scholar

[ref40] 40. Chen R, Huang X, Yang L, Xu X, Zhang X, Zhang Y. Intelligent fault diagnosis method of planetary gearboxes based on convolution neural network and discrete wavelet transform. Computers in Industry. 2019;106:48–59.
View Article
Google Scholar

[118] View Article

[119] Google Scholar

[ref41] 41. Yang L, Chen H. Fault diagnosis of gearbox based on RBF-PF and particle swarm optimization wavelet neural network. Neural Comput & Applic. 2018;31(9):4463–78.
View Article
Google Scholar

[121] View Article

[122] Google Scholar

[ref42] 42. Azamfar M, Singh J, Bravo-Imaz I, Lee J. Multisensor data fusion for gearbox fault diagnosis using 2-D convolutional neural network and motor current signature analysis. Mech Syst Signal Process. 2020;144:106861.
View Article
Google Scholar

[124] View Article

[125] Google Scholar

[ref43] 43. Feng Z, Gao A, Li K, Ma H. Planetary gearbox fault diagnosis via rotary encoder signal analysis. Mechanical Systems and Signal Processing. 2021;149:107325.
View Article
Google Scholar

[127] View Article

[128] Google Scholar

[ref44] 44. Yao G, Wang Y, Benbouzid M, Ait-ahmed M. A Hybrid Gearbox Fault Diagnosis Method Based on GWO-VMD and DE-KELM Gang. Applied Sciences. 2021.
View Article
Google Scholar

[130] View Article

[131] Google Scholar

[ref45] 45. Zhang W, Peng G, Li C, Chen Y, Zhang Z. A new deep learning model for fault diagnosis with good anti-noise and domain adaptation ability on raw vibration signals. Sensors. 2017;17(2).
View Article
Google Scholar

[133] View Article

[134] Google Scholar

[ref46] 46. Zhang Y, Ding J, Li Y, Ren Z, Feng K. Multi-modal data cross-domain fusion network for gearbox fault diagnosis under variable operating conditions. Eng Appl Artif Intell. 2024;133(PC):108236.
View Article
Google Scholar

[136] View Article

[137] Google Scholar

[ref47] 47. Ronneberger O, Fischer PF, Brox T. U-net: Convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III, 2015. 234–41.
View Article
Google Scholar

[139] View Article

[140] Google Scholar

[ref48] 48. Shelhamer E, Long J, Darrell T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans Pattern Anal Mach Intell. 2017;39(4):640–51. pmid:27244717
View Article
PubMed/NCBI
Google Scholar

[142] View Article

[143] PubMed/NCBI

[144] Google Scholar

[ref49] 49. Neha F, Bhati D, Shukla DK, Dalvi SM, Mantzou N, Shubbar S. An analytics-driven review of U-Net for medical image segmentation. Healthcare Analytics. 2025;8:100416.
View Article
Google Scholar

[146] View Article

[147] Google Scholar

[ref50] 50. Jiangtao W, Ruhaiyem NIR, Panpan F. A comprehensive review of U-Net and its variants: advances and applications in medical image segmentation. IET Image Process. 2025;19(1):e70019.
View Article
Google Scholar

[149] View Article

[150] Google Scholar

[ref51] 51. Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models. In: Advances in Neural Information Processing Systems, 2020.
View Article
Google Scholar

[152] View Article

[153] Google Scholar

[ref52] 52. Sariturk B, Seker DZ. A Residual-Inception U-Net (RIU-Net) Approach and Comparisons with U-Shaped CNN and Transformer Models for Building Segmentation from High-Resolution Satellite Images. Sensors. 2022;22(19).
View Article
Google Scholar

[155] View Article

[156] Google Scholar

[ref53] 53. Zhang P, Chen C. Time–Frequency Analysis for Planetary Gearbox Fault Diagnosis Based on Improved U-Net++. J Fail Anal and Preven. 2023;23(3):1068–80.
View Article
Google Scholar

[158] View Article

[159] Google Scholar

[ref54] 54. Beeche C. Super U-Net: A modularized generalizable architecture. Pattern Recognit. 2022;128:108669.
View Article
Google Scholar

[161] View Article

[162] Google Scholar

[ref55] 55. U-Net. https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/ Accessed 2025 January 14.

[ref56] 56. Ahmed I, Ahmad M, Chehri A, Jeon G. A Smart-Anomaly-Detection System for Industrial Machines Based on Feature Autoencoder and Deep Learning. Micromachines (Basel). 2023;14(1):154. pmid:36677215
View Article
PubMed/NCBI
Google Scholar

[165] View Article

[166] PubMed/NCBI

[167] Google Scholar

[ref57] 57. Arshad Mayo S, Rehman S, Cai Z. High-accuracy gearbox fault detection using deep learning on vibrational data. J Phys Conf Ser. 2024;2853(1):12066.
View Article
Google Scholar

[169] View Article

[170] Google Scholar

[ref58] 58. Karaduman G, Kiliç İ, Tasar B, Yaman O. GearDetectionNET: Detection of Gearbox Faults Under Different Load Conditions via 1D-CNN Architecture. J Vib Eng Technol. 2025;13(7).
View Article
Google Scholar

[172] View Article

[173] Google Scholar

Figures

Abstract

1. Introduction

1.1. Related works

1.2. Motivation and contributions

2. Material and method

2.1. Dataset

2.2. The Proposed method

2.3. Full workflow description

2.4. U-Net image segmentation deep learning model

3. Experimental results

4. Discussion

5. Conclusion

5.1. Limitations

5.2. Future work

References