Figures
Abstract
The automatic detection of the degree of surface corrosion on metal structures is of significant importance for assessing structural damage and safety. To effectively identify the corrosion status on the surface of coastal metal facilities, this study proposed a CBG-YOLOv5s model for metal surface corrosion detection, based on the YOLOv5s model. Firstly, we integrated the Convolutional Block Attention Module (CBAM) into the C3 module and developed the C3CBAM module. This module effectively enhanced the channel and spatial attention capabilities of the feature map, thereby improving the feature representation. Second, we introduced a multi-scale feature fusion concept in the feature fusion part of the model and added a small target detection layer to improve small target detection. Finally, we designed a lighter C3Ghost module, which reduced the number of parameters and the computational load of the model, thereby improving the running speed of the model. In addition, to verify the effectiveness of our method, we constructed a dataset containing 6000 typical images of metal surface corrosion and conducted extensive experiments on this dataset. The results showed that compared to the YOLOv5s model and several other commonly used object detection models, our method achieved superior performance in terms of detection accuracy and speed.
Citation: Fu M, Jia Z, Wu L, Cui Z (2024) Detection and recognition of metal surface corrosion based on CBG-YOLOv5s. PLoS ONE 19(4): e0300440. https://doi.org/10.1371/journal.pone.0300440
Editor: Mohammed Abdelsamea, University of Exeter, UNITED KINGDOM
Received: November 7, 2023; Accepted: February 28, 2024; Published: April 10, 2024
Copyright: © 2024 Fu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data set underlying this article has been uploaded to Figshare and is accessible via. https://figshare.com/s/63accf67b33154b291ed.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
1. Introduction
With the rapid advancement of science and technology, the marine economy has increasingly become a significant component of China’s national economy [1,2]. A large number of nearshore and offshore facilities have been put into operation, with steel being the primary material required for these facilities. However, the marine environment is one of the most corrosive natural environments. The marine atmosphere contains various salt particles that have a strong corrosive effect on metals. Fig 1 illustrates the corrosion status of metal facilities in a marine environment. Metal corrosion leads to material loss and wastage, thereby causing substantial economic losses. According to a 2015 research report by the Chinese Academy of Engineering, economic losses caused by corrosion in China amount to approximately USD 310 billion, equivalent to 3.34% of GDP [3]. Furthermore, metal corrosion reduces the mechanical properties of metal materials such as strength, toughness, hardness etc., rendering them incapable of withstanding normal workloads and potentially leading to failures such as fractures or deformations. For critical equipment and structures like bridges, pipelines, aircrafts and trains this poses severe safety threats and could even result in catastrophic consequences. Simultaneously, metal corrosion generates substantial waste and toxic substances (like rust, verdigris and lead salts). These substances could infiltrate into the environment via water flow or air flow or soil seepage polluting water sources soil and air thereby posing threats to human health as well as flora and fauna. Therefore conducting rapid accurate corrosion identification grading detection on steel facilities in marine atmospheric environments holds positive significance for relevant personnel to adopt appropriate anti-corrosion measures thereby avoiding further serious corrosion damage to equipment.
The traditional evaluation of the degree of corrosion on metal structures primarily relies on manual visual methods. This approach not only consumes a significant amount of manpower and time, but long periods of inspection can also lead to operator fatigue. Furthermore, due to a lack of consistency in the evaluation criteria and susceptibility to subjective influences, the accuracy of the determination is limited. As a result, researchers have successively proposed various methods for detecting metal corrosion. Currently, methods for detecting corrosion on metal surfaces can be classified into two main categories: one is based on physical or chemical principles, and the other uses computer vision technology for detection.
Detection methods based on physical or chemical processes primarily use the physical or chemical changes that occur on the metal surface during the corrosion process to assess the condition and degree of metal corrosion. These methods include electrochemical methods, resistance methods, ultrasonic waves, and spectroscopy, among others. Compared with traditional manual visual methods, these detection methods can directly measure the corrosion parameters of the metal surface, such as potential, current, resistance, sound speed, and magnetic field and spectrum, and reflect the corrosion status of the metal surface from these parameters. However, these detection methods require professional instruments and operator.
With the constant advancement of computer technology, computer vision has been extensively applied to the field of corrosion detection. Researchers have begun to utilise the computer vision technology for classification and object detection of corrosion images. This allows for an intuitive presentation of the corrosion morphology and distribution on the metal surface, and enables efficient and accurate processing of a large amount of image data, thereby enhancing detection efficiency and accuracy. Although traditional detection methods based on computer vision have seen some improvements, their accuracy has not yet reached the desired level, and their operation is not sufficiently convenient. Deep learning has brought new ideas to object detection. Convolutional neural networks [4] can extract high-level features, thereby significantly improving classification accuracy. However, current object detection approaches based on deep learning suffer from issues such as loss of metal corrosion information due to model training on open source datasets, large model size, and inability to meet practical application requirements.
Therefore, we selected the most widely used YOLOv5s model in the YOLO series [5–8] as the basis and designed a more lightweight CBG-YOLOv5s object detection model. This model has the benefits of small size, high speed and high accuracy, which theoretically supports deployment on embedded platforms.
The study’s main contributions can be summarised as follows:
- We have developed a corrosion detection model for metal surfaces, called CBG-YOLOv5s. This model can classify metal surfaces into three corrosion levels based on their texture, colour, and depth of the corrosion. It can assist technicians in accurately and promptly identifying the corrosion level of metals.
- We collected 600 original images of corroded metal surfaces and performed data augmentation, constructing a dataset of metal surface corrosion images containing 6000 images.
- In order to improve the detection accuracy of the model, we introduced the C3CBAM module and C3Ghost module, expanded the scale of the YOLOv5s model, and added a small target detection layer. Compared with several other commonly used object detection models, our method had achieved superior performance in terms of detection accuracy and speed.
2. Related work
In the early stages of the research, corrosion detection primarily relied on physical or chemical methods. Yeih et al. [9] used the amplitude attenuation method of ultrasonic detection technology to assess damage from metal corrosion in reinforced concrete, achieving significant detection results. Hong et al. [10] proposed a method for detecting the detachment of the outer protective layer of underwater metal pipes using ultrasound imaging technology based on Support Vector Machine (SVM) and Histogram of Oriented Gradients (HOG) techniques. However, this method did not perform well in detecting narrow detachment damage areas. Wicker et al. [11] proposed a relatively economical method for non-destructive detection of metal corrosion based on simple data analysis and infrared thermal imaging technology. To effectively assess the degree of wire corrosion, Li et al. [12] used Acoustic Emission (AE) technology to detect the externally applied current cathodic protection (ICCP) and prestress changes in the marine atmospheric rainwater environment, and based on this, assessed the degree of rebar corrosion.
In the field of metal corrosion detection, image processing technology has achieved significant detection results. For instance, Pakrashi et al. [13] proposed a method for detecting the degree of metal surface corrosion based on regional optical contrast. This method analyzed the optical contrast between the corrosion area and the surrounding environment and combined it with image processing technology for edge detection. However, it was mainly applicable to pitting corrosion on aluminum surfaces. Additionally, Shen et al. [14] proposed a corrosion recognition method that combined color image processing technology with Fourier transform, based on texture features and color features. This method showed a clear advantage in dealing with non-uniform illumination conditions. Ghanta et al. [15] evaluated corrosion defects on the surface of steel-coated bridges by using a single-scale Haar wavelet transform on RGB sub-images, but this detection method had high requirements for image quality.
In the field of metal corrosion detection, two main categories of deep learning-based object recognition algorithms have been used: single-stage and two-stage. Two-stage algorithms divide feature extraction and object detection into two steps, mainly represented by Region-based Convolutional Neural Network (R-CNN) [16], Faster R-CNN [17] and Fast R-CNN [18]. For example, Guo et al [19] proposed a Faster R-CNN model with a feature enhancement mechanism. This model used the ResNet-101 residual network as the backbone network and implemented a feature enhancement mechanism after the Region of Interest (ROI) pooling layer to achieve the goal of detecting rust on transmission line fittings, achieving a detection accuracy of more than 97%. Additionally, Tian et al [20] proposed a metal corrosion recognition algorithm based on Faster R-CNN and rust HIS colour features. This algorithm converted the corrosion image into a HIS colour model to determine and annotate the corrosion pixels. The Faster R-CNN model was then used to locate and detect the annotated corrosion areas, achieving high accuracy and recall rates. However, due to the complex network structure of the two-stage object detection models, poor real time performance and the size of the models with high hardware demands, these models are not suitable for subsequent deployment on embedded development platforms.
Optimized single-stage detection methods integrate the two steps of feature extraction and object detection, effectively reducing redundant computations and significantly improving the speed of detection. SSD [21] and the YOLO series are the most representative methods in this field. For example, Ramalingam et al. [22] proposed an enhanced SSD MobileNet framework, which includes a periodic pattern detection filter based on self-filtering, used to detect surface defects on aircraft caused by corrosion and cracks. Mukhiddinov et al. [23] proposed a multi-class fruit and vegetable classification system based on an improved YOLOv4 model, which divided the recognised fruit types into fresh or rotten categories. Deyin et al. [24] proposed an aircraft surface defect detection model based on the YOLO network, used to automatically identify corrosion and various defects on the aircraft surface. Matthaiou et al. [25] achieved good results by training SSD using transfer learning technology to detect corrosion objects. Jia et al. [26] proposed the Corrosion-YOLOv5s metal corrosion target detection model based on the YOLOv5s model, which achieved an accuracy rate of 90.5%.
We have made a thorough analysis and study of previous work, and the advantages and disadvantages of each method are summarised in Table 1. We have learnt from the previous research experience and designed the CBG-YOLOv5s model for this work. Compared with the above research work, our proposed model has improved in terms of detection efficiency and accuracy.
The remainder of this paper is organised as follows: Section 3 gives a detailed description of the basic model used in this study. Section 4 discusses the proposed model in more detail and the corresponding improvement measures. Section 5 describes in detail the process of preparing the dataset. The experimental results of the model are mainly shown in section 6 and the experimental results are discussed. Finally, Section 7 summarises the entire paper and looks forward to future work.
3. YOLOv5 model
YOLOv5 is a widely used single-stage object recognition algorithm. Based on the depth and width of the model, the YOLOv5 series is classified into YOLOv5x, YOLOv5l, YOLOv5m and YOLOv5s. Considering that the use of a model with a huge number of parameters on embedded devices could have adverse effects, this study chose the lighter YOLOv5s as the benchmark model for metal corrosion detection from a practical point of view. As can be seen in Fig 2, the structure of this model mainly comprises four parts: the input end, the feature acquisition network (backbone), the feature fusion network (neck), and the recognition head (head).
In the YOLOv5 algorithm, the input end adopted Mosaic data enhancement, adaptive image resizing and adaptive anchor box calculation strategies for data pre-processing. The Mosaic data augmentation method randomly cropped a selected image and three other random images and then stitched them together to form a new image. This enriched the background of the image, increased the number of small targets, increased the diversity of the dataset and thereby improved the robust nature of the model. The backbone structure was composed of multiple Conv, C3 and SPPF modules. By mixing multiple Conv and C3 modules, the feature extraction capability of the model was improved. The SPPF module used multiple small-size pooling cores in a cascade to replace the single large-size pooling core in the previous SPPF module, greatly improving the model’s feature extraction capability. This improvement helped to detect target objects of different sizes and further accelerated the model’s operation. In the neck part, the YOLOv5 model adopted the PANet structure and added a bottom-up path enhancement structure to the top-down feature pyramid to improve the network’s feature fusion capability. The head part contained three detection layers corresponding to three distinct sizes of feature maps acquired from the neck part. According to the size of the feature map, a grid was divided on the feature map, and for each grid, three anchors with different aspect ratios were preset for target prediction and regression.
YOLOv5 employed CIoU_Loss [27] as the loss function for the bounding box regression, as shown in Eq 1. In this context, Eq 2 defined the weighting coefficient, Eq 3 was used to gauge the proportionality between two rectangular boxes, and Eq 4 was used to obtain the ratio of the union of the intersection of the predicted box and the real box.
(1)
(2)
(3)
(4)
In this context, b and bgt denote the centroids of the actual and predicted boxes respectively, ρ denotes the Euclidean separation of these two rectangles, c denotes the diagonal separation of the area contained by these two rectangles, and w, h and wgt, hg denote the width and height of the predetermined and actual boxes respectively.
4. CBG-YOLOv5s model
In this study, we designed the CBG-YOLOv5s object detection network model based on the YOLOv5s version, as shown in Fig 3. In response to issues such as high background complexity of the corrosion area on the metal surface in the dataset, small category differences, and large dataset density and parameter volume, we made improvements in three aspects.
4.1. Convolutional attention mechanism module based C3CBAM model
In the field of vision, the attention mechanism played a key role. It gave neural networks the ability to generate masks, allowing them to automatically learn and focus on important areas. By performing weighted iterations based on mask scores, it was possible to increase the influence of the focus area and reduce the weight of irrelevant information, thereby optimising the network model’s performance. Depending on the location of the mask generation, the attention mechanism could be broadly classified into three types: channel attention mechanism, spatial attention mechanism, and mixed-domain attention mechanism. In this study, a mixed-domain attention mechanism, the CBAM module [28] was adopted. The CBAM module included a channel attention module (CAM) and a spatial attention module (SAM), the detailed structure of which is shown in Fig 4.
The working process of the CBAM module was as follows: Initially, the input feature F∈RC*H*W was processed by the CAM module, generating Mc∈RC*1*1, which was the channel weight vector. Then, Mc was multiplied with F, resulting in the weighted feature F′. Subsequently, F′ was input into the SAM module, yielding the spatial weight matrix Ms∈R1*H*W. Finally, Ms was multiplied with F′, producing the spatially weighted feature F′′.
(5)
(6)
The CAM module worked by focusing on useful feature channels and ignoring useless ones, allowing the model to concentrate more on effective spatial information. Initially, we employed two parallel operations, namely average pooling and max pooling, to integrate the spatial information of the feature map F. Subsequently, these two types of features were passed into a Multilayer Perceptron (MLP), and through parallel max pooling and average pooling operations as well as non-linear activation function processing, the output result of the CAM was obtained. The calculation formula for CAM was as follows:
(7)
By introducing the SAM module, the model was able to focus more on the areas of interest in the feature map. Initially, we used two parallel operations, average pooling and max pooling, to integrate the channel information in the feature map F′. Then, these two types of features were connected and convolved through a convolutional layer, resulting in the final output of the SAM. The calculation formula for SAM was as follows:
(8)
Here, σ represents the sigmoid function, and f represents a convolution operation with a filter size of 7×7.
The CBAM attention mechanism enabled the network to focus more on significant features and suppress unimportant ones, thereby guiding the network to pay attention to which features and their locations, further improving the accuracy of corrosion area positioning and detection. We have combined the C3 module with the CBAM module to form the C3CBAM module. By adding this module to the SPPF layer before the backbone part and the neck part, the expressiveness of the features and the multi-scale fusion ability were further enhanced, thus significantly increasing the accuracy and speed of detection.
4.2. BiFPN-CBAM weighted bi-directional feature fusion network based on fusion attention mechanism
In the target detection algorithm field, to solve the problem of image feature loss due to the increase of network layers, we usually adopt the method of constructing a feature pyramid to fuse semantic information at different levels. In the YOLOv5 network, we adopted the Path Aggregation Network (PANet) structure [29], which simply fused the features of the third to fifth layers. In order to fully account for the impact of different resolutions of the incoming feature map on the outgoing feature map, the Google team proposed a bidirectional weighted BiFPN structure [30] and improved on the basis of PANet. BiFPN fused the features of layers 3 to 7, and deleted the nodes of layers 3 and 7 with small contributions to reduce computation. At the same time, it introduced a cross-scale connection method to avoid too much loss of deep semantic information while preserving shallow linguistic information.
This study drew inspiration from the design philosophy of BiFPN, adding two cross-scale connections between input and output nodes of the same scale to optimize the detection effect of target-dense images. Simultaneously, to enhance the model’s detection capability for small target data, we incorporated the second feature layer into the feature fusion network to retain shallow information, and increased the number of output detection layers to four and expanded the number of prediction boxes from nine to twelve. This method broadened the model’s perceptual range, heightened its sensitivity to small targets, and further improved the detection effect of small targets.
The complexity and diversity of the background of metal corrosion, such as the inconsistency in the shooting distance of manually collected images, and the differences in corrosion color and texture caused by different metals in different environments, made it difficult for the model to extract high-quality information, thereby increasing the missed detection rate and false detection rate of the corrosion area. To effectively solve these problems, we adopted a strategy to enhance the perceptual ability of the model and further optimized it. Based on the feature fusion network we proposed, we introduced the CBAM module. In this way, YOLOv5’s feature fusion network can not only enhance the model’s detection ability for small target data, but also strengthen the importance screening of different channel features and the attention to direction and position information. We named this module BiFPN-CBAM, as shown in Fig 5.
Compared to the original YOLOv5 model’s PANet feature fusion network, the BiFPN-CBAM we designed demonstrated superior performance when dealing with datasets where factors such as dense targets, complex backgrounds, and low image resolution were present.
4.3. C3Ghost model based on lightweight neural network
Ghost Convolution [31], as a lightweight convolutional neural network, successfully reduced the number of parameters and computational complexity while maintaining model performance. Ghost Convolution adopted a two-step strategy to replace traditional convolution operations: firstly, it generated a smaller number of feature maps through ordinary convolution operations; then, it produced more feature maps through linear transformation and combination operations. Finally, it concatenated these two groups of feature maps into a Ghost feature map. Compared to ordinary convolution operations, Ghost Convolution greatly reduced computational load and the number of parameters. As shown in Fig 6, it illustrated the traditional convolution and Ghost Convolution modules.
(a) Ordinary convolution operation. (b) Ghost convolution operation.
In order to decrease the model size, in this paper we used Ghost Convolution to replace the ordinary convolution operation in the C3 module, thus constructing a lighter C3Ghost module, the concrete structure of which is shown in Fig 7. In GhostConv, we used a 1×1 convolution kernel to decrease the number of channels of the input feature map to half of the original, and concatenated it with the feature map obtained after processing with a 5×5 convolution kernel. The transformed GhostBottleneck first reduced the number of input feature map channels by half through the first GhostConv, then restored it to the original number of channels through the second GhostConv, and fused it with the features obtained through 3×3 depth convolution. By substituting the bottleneck in the C3 module with GhostBottleneck, the C3Ghost module achieved a reduction in parameters and computational complexity, improved model efficiency and speed, and enhanced the expressiveness of feature maps, thereby improving model accuracy and robustness.
5. Data set production
The learning results of the model were influenced by the dataset, and the generalisation ability of the model had a certain correlation with the diversity of corrosion images in the dataset. This research aimed to explore the corrosion of metal surfaces under marine environments. The data processing flow of metal surface corrosion is shown in Fig 8.
We selected the coastal area of Yantai as the data collection site due to its typical marine climate characteristics, where seawater had a strong corrosive effect on metal surfaces. To ensure the quality and consistency of the images, we used professional image collection equipment and strictly followed the standards and norms during operation. During the collection process, we obtained a total of 732 images of metal surface corrosion with different types, degrees, states, and colors. After removing blurry and duplicate images, we finally selected 600 representative images of metal surface corrosion.
In order to enhance the efficiency of model training and to reduce the computational burden, we standardised the size of the images by adjusting all images to a dimension of 640×640. Based on the colour, texture and other characteristics of metal surface corrosion, we classified it into three types: light corrosion (LC), moderate corrosion (MC) and heavy corrosion (HC). Fig 9 shows the metal surface corrosion conditions of these three types. To ensure the balance of various targets in the dataset, we collected 200 images of metal surface corrosion in each grade and used LabelImg annotation software to accurately annotate the corroded areas on the metal surface.
To increase the robustness of the model, we employed five data augmentation techniques, including cropping, translation, brightness adjustment, noise addition and angle rotation. These techniques were randomly combined to enhance the original images. The enhancement effect is shown in Fig 10. The enhanced images of metal surface corrosion were randomly divided into training, validation and test sets in a ratio of 8:1:1, resulting in a data set of 4800 training images, 600 validation images and 600 test images, making a total of 6000 images. These images were used to construct a metal surface corrosion detection dataset for this study. The specific partitioning of the dataset is shown in Table 2.
6. Experiments
6.1. Experimental environment and parameter settings
The experimental environment was as shown in Table 3 below.
We set the starting learning rate for the YOLO model to 0.01, the momentum parameter to 0.937, the batch size to 16, and the number of rounds of training to 200.
6.2. Evaluation indicators
The aim of this study was to enhance the model’s detection accuracy while reducing the model’s parameter volume. To test the effectiveness of the proposed model, we quantitatively evaluated the performance of the algorithm using a set of evaluation metrics widely accepted in the object detection field, including accuracy, recall, F1 score, FPS, mAP and parameter volume, which were defined separately using Eqs (9)–(13). Among them, accuracy was used to measure the probability of correctly predicting positive samples; recall represented the proportion of actual positive samples that were correctly predicted; AP was the area under curve based on recall and precision of each category; mAP was the mean value of AP values of each category, the higher its value, the greater the detection accuracy of the model and the more effective; FPS represented the number of frames per second processed by the model, which was used to assess the speed of detection of the model.
(9)
(10)
(11)
(12)
(13)
where TP, FP and FN are the number of true positives, the number of false positives and the number of false negatives, respectively, and c is the number of classes.
6.3. Comparison with YOLOv5s model
Through an in-depth analysis of the comparative experimental results, we found that under the same environmental conditions, the same initial parameter settings, and the same dataset, the CBG-YOLOv5s model achieved significant improvements in all evaluation indicators compared to the original YOLOv5s model (see Tables 4 and 5 for details). Notably, in light corrosion images where there were a large number of small targets, the CBG-YOLOv5s model demonstrated significant performance advantages, which fully validated the effectiveness of our strategy to add a small target detection layer. Further, after a comprehensive comparison of these two models in Table 6, we found that the CBG-YOLOv5s model not only achieved significant improvements in key indicators such as accuracy, recall rate, F1-score, mAP@.5 and mAP@.5:.95 (increasing by 3.2%, 1.6%, 2.4%, 2.7% and 1.2% respectively), but also successfully reduced about 1.37 million parameters. These experimental results fully prove that the CBG-YOLOv5s model can not only enhance the detection accuracy, but also reduce the parameter volume by about 20%. Therefore, the proposed improvement strategy has broad prospects of application in the field of metal surface corrosion degree detection.
Upon conducting an in-depth testing of the YOLOv5s model and the CBG-YOLOv5s model, we found that the CBG-YOLOv5s model significantly improved the issues of small target misdetection and omission in complex backgrounds compared to the original model, and demonstrated more accurate positioning capabilities and higher detection accuracy. Notably, the CBG-YOLOv5s model was not only able to accurately detect corrosion areas, but also accurately identify the grades of corrosion areas. The specific test results can be seen in Figs 11 and 12.
6.4. Comparison with other target detection models
After an in-depth comparison of the enhanced algorithm with the current leading object detection algorithms (including CenterNet, Faster R-CNN, SDD, YOLOv3, YOLOv4 and YOLOv5s) under the same experimental conditions and using the same dataset, Table 7 demonstrates the comparison results. We found that the CBG-YOLOv5s model has a faster detection speed compared to the CenterNet and Faster R-CNN two-stage object detection models. This is mainly due to the lower complexity of the one-stage object detection algorithm compared to the two-stage object detection algorithm. In addition, the CBG-YOLOv5s model achieved significant improvements in accuracy, recall rate, mAP value and F1 score compared to the SDD, YOLOv3, YOLOv4 and YOLOv5s single-stage object detection models, demonstrating higher detection accuracy. Although the improved CBG-YOLOv5s model had an increased number of network layers due to the addition of a small target detection layer, which reduced its recognition speed, its recognition speed still far exceeded other comparison models, except for being lower than the original version of YOLOv5s. These comparative experimental results fully validate the superior performance of the CBG-YOLOv5s object detection model in metal corrosion target detection.
Fig 13 demonstrated the performance of the CBG-YOLOv5s model in comparison with other models in terms of accuracy in recognizing surface corrosion on metals. It was clearly evident that the proposed model outperformed other detection models in terms of accuracy in recognizing surface corrosion on metals. Considering all factors, our model exhibited superior detection results.
6.5. Ablation experiments
This paper introduced three enhancement methods, namely C3CBAM, BiFPN-CBAM, and C3Ghost. To verify the effectiveness of these modules for CBG-YOLOv5s in the detection of surface corrosion on metals, a series of ablation experiments were conducted. The results are presented in Table 8. In the table, “-” indicates that the module was not used, while “√” signifies that the module was incorporated.
The experimental results demonstrated that the inclusion of the C3CBAM module led to a recall rate of 86.7%, validating its effectiveness in target localization. The introduction of the BiFPN-CBAM module resulted in an accuracy increase of 1.8% and an mAP increase of 0.8%, while also reducing the parameter volume to a certain extent. These findings suggest that adding a small target detection layer aids in identifying small targets within the dataset. The implementation of the C3Ghost module reduced the parameter volume by 0.736×106 and achieved an mAP@0.5 value of 95%. This opens up possibilities for deployment on embedded platforms.
We utilised four metrics, namely precision, recall, mAP@.5 and mAP@.5:.95, to assess the behaviour of the CBG-YOLOv5s model. Fig 14 illustrates the variation curves of these evaluation metrics throughout the training procedure. As can be seen from the graph, the trends of these four evaluation metrics were largely similar. In the early stages of model training, all metrics increased rapidly, while in the later stages of training they tended to stabilise. After 200 training rounds, all metrics converged.
The experimental results substantiated that the enhancement strategies we proposed not only significantly improved the detection accuracy of the model for identifying surface corrosion on metals, but also effectively reduced the parameter volume. These achievements fully validated the effectiveness of the improvement methods we proposed.
6.6. Limitation and discussion
Although our research produced some results, there were still shortcomings. The main problem was that we collected a small number of corrosion materials in our data set, which was not representative. This is because different metal materials have different shapes, colours, depths and ranges of corrosion. We also needed to develop a complete system for detecting corrosion on metal surfaces so that it could be used in practical engineering.
To solve these problems, our next goal was to expand our data set, collect as many corrosion images of different metal materials as possible, and improve the generalisation ability of the CBG-YOLOv5s model so that it could adapt to more detection scenarios. Of course, this would also reduce our model detection speed, so we would further optimise the model structure and achieve model lightweighting on the basis of ensuring model detection accuracy. In addition, we would develop a complete detection system for this model and integrate it into handheld devices so that it could be used in real-life scenarios.
7. Conclusion
This paper proposed a novel model, CBG-YOLOv5s, for the detection of surface corrosion on metals. The model was designed to address challenges such as high background complexity in the corrosion area, small category differences, and dense targets. Firstly, we designed the C3CBAM module to more effectively extract features of surface corrosion on metals. Secondly, we introduced the BiFPN-CBAM module to enhance the model’s detection capability for targets of different scales and to aid in the accurate identification of small targets. Lastly, we designed a lightweight C3Ghost module to compress the parameter volume, making the entire model more compact. Compared to other target detection models, this method has higher accuracy and lightweight advantages in the task of detecting surface corrosion on metals. Future research will focus on two directions: first, we will collect more extensive images of surface corrosion on metals to further improve the generalizing ability of the CBG-YOLOv5s model, making it applicable to a wider range of detection environments; second, while ensuring the accuracy of the model detection, we will further achieve model lightweighting for embedding in mobile handheld devices.
References
- 1. Sheng X U, Lu B, Yue Q. Impact of sci-tech finance on the innovation efficiency of China’s marine industry[J]. Marine Policy, 2021, 133: 104708.
- 2. Wang Y, Wang N. The role of the marine industry in China’s national economy: An input–output analysis[J]. Marine Policy, 2019, 99: 42–49.
- 3. Hou B, Li X, Ma X, et al. The cost of corrosion in China[J]. npj Materials Degradation, 2017, 1(1): 4.
- 4.
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 3431–3440.
- 5.
Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779–788.
- 6.
Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 7263–7271.
- 7. Redmon J, Farhadi A. Yolov3: An incremental improvement[J]. arXiv preprint arXiv:1804.02767, 2018.
- 8. Bochkovskiy A, Wang C Y, Liao H Y M. Yolov4: Optimal speed and accuracy of object detection[J]. arXiv preprint arXiv:2004.10934, 2020.
- 9. Yeih W, Huang R. Detection of the corrosion damage in reinforced concrete members by ultrasonic testing[J]. Cement and concrete research, 1998, 28(7): 1071–1083.
- 10. Hong X, Huang L, Gong S, et al. Shedding damage detection of metal underwater pipeline external anticorrosive coating by ultrasonic imaging based on HOG+ SVM[J]. Journal of Marine Science and Engineering, 2021, 9(4): 364.
- 11. Wicker M, Alduse B P, Jung S. Detection of hidden corrosion in metal roofing shingles utilizing infrared thermography[J]. Journal of Building Engineering, 2018, 20: 201–207.
- 12. Li S, Liang Z, Zhang L. Corrosion evaluation of prestressed high‐strength steel wires with impressed current cathodic protection based on acoustic emission technique[J]. Structural Control and Health Monitoring, 2022, 29(5): e2934.
- 13. Pakrashi V, Schoefs F, Memet J B, et al. ROC dependent event isolation method for image processing based assessment of corroded harbour structures[J]. Structures & Infrastructure Engineering, 2010, 6(3): 365–378.
- 14. Shen H K, Chen P H, Chang L M. Automated steel bridge coating rust defect recognition method based on color and texture feature[J]. Automation in Construction, 2013, 31: 338–356.
- 15.
Ghanta S, Karp T, Lee S. Wavelet domain detection of rust in steel bridge images[C]//2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2011: 1033–1036.
- 16.
Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 580–587.
- 17. Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[J]. Advances in neural information processing systems, 2015, 28.
- 18.
Girshick R. Fast r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2015: 1440–1448.
- 19. Guo Z, Tian Y, Mao W. A Robust Faster R-CNN Model with Feature Enhancement for Rust Detection of Transmission Line Fitting[J]. Sensors, 2022, 22(20): 7961.
- 20.
Tian Z, Zhang G, Liao Y, et al. Corrosion identification of fittings based on computer vision[C]//2019 International Conference on Artificial Intelligence and Advanced Manufacturing (AIAM). IEEE, 2019: 592–597.
- 21.
Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, 2016: 21–37.
- 22. Ramalingam B, Manuel V H, Elara M R, et al. Visual inspection of the aircraft surface using a teleoperated reconfigurable climbing robot and enhanced deep learning technique[J]. International Journal of Aerospace Engineering, 2019, 2019.
- 23. Mukhiddinov M, Muminov A, Cho J. Improved classification approach for fruits and vegetables freshness based on deep learning[J]. Sensors, 2022, 22(21): 8192.
- 24.
Deyin Z, Penghui W, Mingwei T, et al. Investigation of Aircraft Surface Defects Detection Based on YOLO Neural Network[C]//2020 7th International Conference on Information Science and Control Engineering (ICISCE). IEEE, 2020: 781–785.
- 25.
Matthaiou A, Papalambrou G, Samuelides M S. Corrosion detection with computer vision and deep learning[C]//Developments in the Analysis and Design of Marine Structures: Proceedings of the 8th International Conference on Marine Structures (MARSTRUCT 2021, 7–9 June 2021, Trondheim, Norway). CRC Press, 2021: 289.
- 26. Jia Z, Fu M, Zhao X, et al. Intelligent identification of metal corrosion based on Corrosion-YOLOv5s[J]. Displays, 2023, 76: 102367.
- 27.
Zheng Z, Wang P, Liu W, et al. Distance-IoU loss: Faster and better learning for bounding box regression[C]//Proceedings of the AAAI conference on artificial intelligence. 2020, 34(07): 12993–13000.
- 28.
Woo S, Park J, Lee J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 3–19.
- 29.
Liu S, Qi L, Qin H, et al. Path aggregation network for instance segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 8759–8768.
- 30.
Tan M, Pang R, Le Q V. Efficientdet: Scalable and efficient object detection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 10781–10790.
- 31.
Han K, Wang Y, Tian Q, et al. Ghostnet: More features from cheap operations[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 1580–1589.