Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Detection of submarine pipeline and cable targets based on depth feature of high resolution sonar image

  • Dandan Liu ,

    Contributed equally to this work with: Dandan Liu, Zezhou Jin, Jiajie Chen

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation College of Electrical Engineering, Yancheng Institute of Technology, Yancheng, China

  • Zezhou Jin ,

    Contributed equally to this work with: Dandan Liu, Zezhou Jin, Jiajie Chen

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation The College of Electrical Engineering and New Energy, Hubei Provincial Engineering Technology Research Center for Microgrid, China Three Gorges University, Yichang, China

  • Jiajie Chen ,

    Contributed equally to this work with: Dandan Liu, Zezhou Jin, Jiajie Chen

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation College of Electrical Engineering, Yancheng Institute of Technology, Yancheng, China

  • Zhiping Xu

    Roles Funding acquisition, Resources, Supervision, Writing – review & editing

    zhipingxu@jmu.edu.cn

    Affiliation School of Ocean Information Engineering, Jimei University, Xiamen, China

Abstract

For Side-Scan Sonar (SSS) submarine pipeline and cable target feature extraction, there are some problems such as poor real-time performance, high false detection rate, and difficulty in deploying edge equipment.With deep feature technology,this study applies a deep neural network to detect submarine pipeline and cable targets in order to solve the above problems.To enable real-time detection of submarine pipelines and cable in SSS imagery, we improve the YOLO11n-seg model by incorporating the A2C2f and DSConv modules, leveraging the characteristic features of the target images. It reduces the false detection rate of submarine pipeline feature in SSS image,the size of parameters and realize lightweight deployment.In allusion to the Marine-PULSE submarine pipeline and cable dataset,ablation experiments and comparative experiments are designed. The experimental results show significant improvements over the original YOLO11n-seg model. Specifically,the modified model bounding box recall improved by 9.7%,and mAP@50-95 improved by 1.6%; instance segmentation recall improved by 10.3%,and mAP@50 improved by 3.6%. The detection precision and integrity are enhanced synchronously, and the size of parameters is reduced by 15%, which has stronger advantages in real-time performance. Regarding object detection, our model demonstrates superior performance, with its mAP@50 improved by 5.2% compared to YOLO12n-seg and by 12.5% compared to YOLO13n-seg. Experiments show that the model designed in this study is an effective method for real-time detection of SSS submarine pipeline and cable targets, and has a good development prospect and promotion.

Introduction

Submarine pipeline and cable is the ‘lifeline’ of marine infrastructure,their reliability is paramount for the development of deep-sea oil and gas resources, and constitute a key pillar of the offshore petrochemical industry. [1,2]. However, in the whole process of submarine pipeline operation, geological structure changes, seawater corrosion and sediment burial will lead to pipeline leakage [3], which will cause huge economic losses and environmental pollution, and have a serious impact on society and environment [4,5]. Therefore, it is necessary to regularly carry out safety inspection and maintenance of underwater pipelines to prevent leakage accidents and reduce risks. SSS and SAS (Synthetic Aperture Sonar) [611] are important technologies to measure seabed topography and obtain seabed images. SAS uses a small aperture sonar transducer array to obtain higher track resolution by moving to form a virtual large aperture [1215]. SSS is a kind of light acoustic sensor, which generally needs to be equipped with an underwater towed body for work. Its equipment is simple to install and have a high lateral resolution of the target. The target can be identified and judged by shadow [1620]. There have been a lot of studies on autonomous detection of submarine pipelines and cables on AUV (Autonomous Underwater Vehicle) [2124]. Compared with underwater optical images, SSS images have irreplaceable advantages in submarine pipeline and cable inspection tasks. Underwater optical imaging is severely limited by ambient light, water turbidity and suspended particles. It cannot work effectively in deep-sea lightless environments and turbid near-shore waters, which are common scenarios for submarine pipeline operation. In contrast, side-scan sonar relies on acoustic wave propagation for imaging. It is not affected by light conditions and water transparency, and can stably acquire high-resolution seabed images in full water depth and complex marine environments. In addition, side-scan sonar has a longer detection range and wider coverage. It is more suitable for long-distance and large-scale seabed pipeline inspection carried by AUVs, which fully meets the practical needs of marine engineering.

The traditional target detection and recognition method of SSS image is based on manual interpretation [2527]. In complex terrain, it have high false detection rate and low detection efficiency [2831]. Therefore, the online real-time detection of the target cannot be realized [3234]. The target detection method based on deep features uses the powerful automatic feature learning ability of deep neural networks to mine effective features from massive seabed image data [3538], improve processing efficiency and precision, which can automatically learn different scale features to achieve end-to-end automatic extraction and detection of target object features such as submarine pipelines and cables [3941]. The use of deep structure enables the network to extract the semantics and most important features of objects in the background by gradually extracting and combining the features of different layers. It can handle more complex tasks and has higher recognition performance to meet the needs of real-time detection [4245].

The target detection method based on deep features has two-stage method and one-stage method. The two-stage method uses a PN (Proposal Network) to search for the target object, and then uses the second network to fine-tune these suggestions and output the final prediction [4649], mainly including R-CNN [50], Faster R-CNN [51]. Although these methods have high detection precision, they have slow detection speed and poor real-time performance [5254], which cannot meet the needs of real-time detection of underwater AUV embedded platforms. The one-stage method mainly includes YOLO [55], SSD [56] and RetinaNet [57]. It does not need to generate candidate regions, and directly obtains the category and location of the target from the input image, with good real-time performance [5863]. Among them, YOLO series algorithms are most widely used for their faster reasoning speed [64,65]. The core task of this study is to achieve end-to-end detection and instance segmentation of submarine pipeline targets in SSS images, and meet the real-time deployment requirements on resource-constrained edge devices of AUVs. Two-stage detection algorithms have high detection accuracy, but their inference speed is too slow to meet the needs of real-time underwater detection. Other one-stage algorithms, such as SSD and RetinaNet, have a poorer trade-off between detection accuracy and inference speed than the YOLO series. Meanwhile, these algorithms do not natively support the instance segmentation task required in this study. The YOLO series has become the most mainstream technical framework in the field of underwater sonar image target detection, and its engineering practicability has been fully verified by a large number of existing studies. We select YOLO11n-seg as the baseline model for three key reasons. First, it natively integrates the object detection branch and instance segmentation branch. It can complete both detection and segmentation tasks in a single network, which avoids additional computational overhead and deployment complexity caused by cascaded models. Second, the lightweight nano-version architecture of YOLO11n-seg has an extremely small parameter scale and computational cost, which is highly suitable for deployment on the edge computing platform of AUVs. Third, its highly modular network design provides great convenience for our targeted improvement. We can optimize the network structure for the slender linear features of submarine pipeline targets, while maintaining the lightweight and real-time performance of the model.

For the target detection algorithm for SSS submarine pipeline image, aiming at the scarcity of underwater data which is not easy to obtain [66], Du et al. [67] proposed a transfer learning framework based on GoogLeNet. Experiments show that ImageNet pre-training can improve the accuracy of the model by 10%; li et al. [68] proposed a single-stage image generation method for small sample detection to solve the problem of insufficient data. Zheng et al. [69] used CycleGAN to convert optical images into pseudo-sonar images to alleviate data scarcity. At the same time, they improved the YOLOv8 network structure, enhanced small target feature extraction through attention mechanism and deformable convolution, and improved detection accuracy significantly. Zheng et al. [70] generated synthetic data based on the principle of sonar imaging, combined with YOLOv5s to achieve zero-shot learning, and completed pipeline positioning without real labeling, with a horizontal prediction error of only 0.23 pixels.Aiming at the difficulty of underwater small target detection and the slender characteristics of submarine pipelines and cables [71], Fu et al. [72] introduced K-means++ re-clustering anchor frame to match the size distribution of small targets. At the same time, the shallow feature fusion layer is added, and the attention mechanism is combined to improve the response of small targets. The mAP@50 reaches 96.1%;cheng et al. [73] used ODConv (Omni-dimensional Dynamic Convolution) instead of traditional convolution to dynamically adapt to the target scale, and used the GAM (Global Attention Mechanism) to suppress background noise, mAP@50–95 increased by 2.51%. Wang et al. [74] designed a multi-size parallel convolution module to capture different scale features at the same time, and compared Transformer and CBAM (Convolution Block Attention Module). Finally, the AP value of 97.62% was achieved, and the inference speed was up to 100 FPS.Aiming at the problem of complex background and noise of SSS seabed image, Zhou et al. [75] proposed STGAN network, which combines Transformer to extract global features and convolution to capture local texture. A variety of loss functions are designed, and the PSNR of the image is increased by 58.73% after denoising. Lee et al. [76] combined CS (Compressed Sensing) and CoordConv network, and used coordinate information to guide denoising to achieve end-to-end training optimization nonlinear reconstruction, which can still preserve pipeline edge details in low signal-to-noise ratio scenarios. Using multi-modal fusion and cross-sensor detection can improve the detection accuracy and is suitable for complex seabed terrain. Liu et al. [77] constructed an end-to-end CNN to locate the cable directly from the magnetic anomaly data. The positioning accuracy is 30% higher than the traditional method, and the noisy data can be processed; duan et al. [78] combined SSS and SBP (Subbottom Profiler) data to enhance the weight of pipeline features through SE-Net, with a recall rate of 99.2%, which is suitable for complex seabed terrain.For AUV resource constraints, real-time airborne processing performance is more challenging [79,80]. Li et al.divided long-term sonar images into sub-images for parallel processing, and optimized the YOLOv5s network structure to achieve a detection speed of 304 ms / frame, with an AP value of 97.62%.Yang et al. [81] proposed a lightweight SS-YOLO model that enables the model to improve efficiency on edge devices with limited processing capacity and storage. In order to improve the generalization of the model and realize zero sample detection, Zheng et al. [82] generated synthetic data based on the physical model of sonar imaging, combined with YOLOv5 s to achieve zero sample detection. The horizontal positioning error in the Yellow Sea measured data is only 0.23 pixels; dakhil et al. [83] systematically compared the performance of YOLO, Faster R-CNN, U-Net and other models in sonar images, and pointed out that attention mechanism and multi-scale feature fusion are the key to improving generalization ability.In summary, the current challenges faced by the target detection task of SSS submarine pipeline image include the limited number of data set samples, high model complexity, and poor detection accuracy [8493].

In view of the above problems, this study introduces the A2C2f (Area-Attention Enhanced Cross-Feature) based on the YOLO11n-seg network. Its adaptive weighted fusion and cross-scale optimization capabilities significantly improve the flexibility of feature expression. The DSConv (Depthwise Separable Convolution) is used to replace the standard convolution to complete the instance segmentation task of the SSS submarine pipeline, so as to ensure the precision of model detection. At the same time, the number of parameters and computing resources of the model are minimized to meet the requirements of real-time detection of underwater robots.Aiming at the target data of Marine-PULSE submarine pipeline and cable, through ablation and comparative experiments, compared with the benchmark model YOLO11n-seg, the segmentation precision mAP@50 of this research model is increased by 3.6%, the bounding boxes precision mAP@50–95 is increased by 1.6%, and the recall rate on the detection bounding boxes and instance segmentation is increased by 9.7% and 10.3%, respectively. At the same time, the number of parameters is reduced by 15%, so that the model can accurately and quickly complete the target detection task under limited computing resources, and meet the needs of AUV for underwater real-time detection.

Baseline model

YOLO11 is one of the latest stable versions of the YOLO series released by Ultralytics. It is selected as the baseline framework in this study for three core reasons, which are highly matched with the requirements of underwater pipeline and cable inspection tasks. The lightweight nano version YOLO11n-seg achieves an excellent balance between detection accuracy and inference speed. Its optimized network structure has lower computational complexity and smaller parameter scale, which is fully suitable for deployment on the resource-constrained edge devices of AUVs. YOLO11n-seg natively integrates the object detection and instance segmentation branches. It can complete target positioning and contour segmentation of submarine pipelines in a single forward pass, which avoids additional computational overhead caused by multi-model cascading. Compared with other recent YOLO versions, YOLO11 shows better feature extraction efficiency for slender linear targets in low-contrast and high-noise sonar images under the same lightweight parameter scale. The SSS image of submarine pipeline target has problems such as low contrast, strong noise, and easy confusion between target and background. The YOLO11n-seg image detection algorithm can use the CNN-based target detection framework to automatically extract the depth features of the input image, such as capturing the continuous linear structure and shadow features of the pipeline, and more accurately identify the pipeline target from the complex background. The YOLOv11-seg network structure is shown in Fig 1.

It can be seen from Fig 1 that the Backbone of YOLO11-seg adopts the C3K2 module, which uses dynamic parameter adjustment to optimize shallow feature extraction and improve the calculation speed. The C3K2 module is a faster implementation of the CSP (Cross Stage Partial) bottleneck architecture with two convolutions. The CSP network divides the feature map and processes one part through the bottleneck layer, and combines the other part with the output of the bottleneck, which reduces the amount of calculation and improves the feature representation. The C3K2 module uses a smaller kernel size to make it faster while maintaining performance, so that YOLO11n-seg can extract features faster when processing images. Neck aggregates features of different resolutions and passes them to Head for prediction. The C3K2 module is used to improve the speed and performance of the feature aggregation process.

However, submarine pipelines often show a state of bending, breaking or being partially covered by sediments. In the submarine pipeline segmentation task, the fixed convolution kernel weight and rigid receptive field of the C3K2 module are difficult to adapt to the dramatic changes in the target shape, scale and background, which can easily lead to fracture and missed detection. Its conventional convolution operation has no specific sensitivity to the edge information in the image, which limits their ability to extract edge details from the image. In order to improve the detection effect of the model on the submarine pipeline target, the C3K2 module is replaced by the attention mechanism module, so that the model can accurately focus on the low-contrast target in a large number of irrelevant backgrounds, and it is required to dynamically adjust the receptive field to adapt to the irregularity of the pipeline target.

YOLO11-seg enhances the feature extraction ability by adding a C2PSA module after SPPF (Spatial Pyramid Pooling – Fast), thereby improving the detection accuracy of the model. C2PSA is an extension of C2f. By introducing the PSABlock (Position-Sensitive Attention Block) mechanism, it aims to enhance the feature extraction ability through the multi-head attention mechanism and the feedforward neural network FFN (Feed-Forward Neural Network). It can selectively add residual structure to optimize gradient propagation and network training effect, whose detailed workflow is shown in Fig 2.

It can be seen from Fig 2 that the input feature map is first mapped into three independent feature matrices: query, key and value, through three parallel 1×1 convolution layers. Then, a reshape operation is performed on Q, K and V to split the feature maps into multiple sub-regions along the spatial dimension for efficient regional attention calculation. The similarity between Q and K is calculated via MatMul, and the result is adjusted by the attention scaling factor. The relative position bias (Pos) is added to the scaled similarity matrix, and the attention weight is obtained through Softmax normalization. The attention-weighted features are generated by matrix multiplication between the attention weight and the V matrix. After that, the features are processed by the FFN, and the final output of the block is obtained through two residual connections. This position-sensitive attention mechanism can effectively capture the long-distance dependence of features, enhance the model’s ability to extract the continuous linear features of submarine pipeline targets, and suppress the interference of complex seabed background noise.

Due to the limitations of high computational complexity and low energy efficiency ratio of the standard Conv module, it is not conducive to the real-time detection of underwater pipeline targets. Therefore, the lightweight Conv module should be used to replace the standard Conv module. While taking into account the detection accuracy, the number of model parameters and the amount of calculation are minimized, and the reasoning speed is improved to achieve real-time detection of underwater pipeline targets.

Proposed method

Aiming at the problems analyzed in section 2, the YOLO11-seg network model is improved, the C3K2 module is replaced by the A2C2f module, the standard Conv module is replaced by the DSConv module, and a lightweight model for real-time detection of SSS cable targets is designed.

A2C2f module

The A2C2 f module is constructed based on Transformer attention. The ABlock module with A2 (Area Attention) is used for residual enhancement feature extraction. The R-ELAN (Residual Efficient Layer Aggregation Networks) further enhances the optimization ability and feature expression ability of the model by introducing residual connection and new feature aggregation methods.

Fig 3 shows the A2C2f module network structure. The core ABlock module of A2C2f includes regional attention and MLP (Multilayer Perceptron) layer, which reduces the expansion ratio of MLP in typical Transformer from 4.0 to 1.2, balances the calculation amount of attention layer and feedforward layer, and reduces the depth of stacked blocks to promote optimization, which is used for fast feature extraction and attention mechanism enhancement.

A2 uses the most direct equipartition method to divide the feature map with a resolution of (H, W) vertically or horizontally into g regions with a size of (H/g, W) or (H, H/g), and performs attention calculation in each region, thereby significantly reducing the computational complexity. This method realizes segmented feature processing through spatial remodeling, which not only retains the global receptive field, but also reduces the amount of calculation through simple region division operations, thus improving the speed while maintaining high performance. The large receptive field of A2 enables the model to capture the global context, combined with the position sensor to enhance the spatial information, and the activation area is more focused on the target subject.

The calculation process of the regional self-attention module is shown in Fig 4. The input of this module is the multi-channel feature map X extracted by the previous network layer, with the shape of H * W * C. Here, H is the height of the feature map, W is the width, and C is the number of feature channels. The query matrix Q, key matrix K, and value matrix V in Eq (1) are all generated from the input feature map X through independent linear projection layers. Each linear projection layer is implemented by a 1×1 convolution with learnable weights. The Q matrix is specially designed to extract the horizontal structural features of the slender linear submarine pipeline and cable targets in SSS images; the K matrix is used to extract the vertical structural features of the pipeline and cable targets; the V matrix carries the complete spatial and channel feature information of the input, and is used to generate the final attention-weighted features.

After generation, dividing Q, K and V into g equal regions along the spatial dimension respectively, following the regional division rule of the A2 attention mechanism. For the i-th divided region, the corresponding sub-matrices are Qi, Ki and Vi. We define the regional self-attention calculation for the i-th region in Eq (1),

(1)

where dk is the attention scaling factor. Its value is fixed as the channel dimension of the single-head attention feature, which is determined by the channel number of the generated Q and K matrices. It is used to avoid the gradient disappearance of the Softmax function caused by excessive inner product values of Q and K. Bi is the relative position bias matrix within the i-th region. It is a learnable parameter initialized before model training, and is updated synchronously during the end-to-end training process. It encodes the spatial position information of pixels in the region, to help the model capture the continuous linear structure of submarine pipeline targets. Softmax is the standard normalization function. It normalizes the similarity score between Qi and Ki into attention weights ranging from 0 to 1, to realize adaptive weighting of target-related features and background features.

After completing the attention calculation for all regions, concatenating the attention output of each region along the spatial dimension, and perform residual connection with the original input feature map X. The final output of the regional attention module is defined in Eq (2),

(2)

In Eq (2), X represents the original input feature map of the A2 regional attention module, and Concat represents the feature concatenation operation. The residual connection structure helps to optimize gradient propagation during model training.The computational complexity of each region is , and the total complexity is , which is g times lower than the global self-attention H2 * W2 * C, and enhances the spatial expression ability of features at low computational cost.

The core idea of R-ELAN is to improve the gradient flow and feature expression ability of the model through the improvement of residual connection and feature aggregation method. The application layer scaling for each region does not solve the optimization challenge, but increases the delay. At the same time, the aggregation method is redesigned. The original ELAN layer first processes the input through the transition layer, and then divides it into two parts for processing and splicing. R-ELAN first generates a single feature map by adjusting the channel dimension through the transition layer, and then splices after subsequent module processing to form a bottleneck structure.

DSConv module

Fig 5 shows the DSConv module network structure.DSConv consists of DWConv (Depthwise Convolution) and PWConv (Pointwise Convolution). DWConv is used to extract spatial features and PWConv is used to extract channel features.

DSConv first performs DWConv on each channel, and then merges all channels through PWConv as the output feature map, so as to reduce the calculation amount and improve the calculation efficiency. After that, adding activation function and BatchNorm helps to improve the nonlinear expression ability of the network, so that the network has a stronger ability to fit more complex functions. The DSConv module process is shown in Fig 6.

The operational workflow of DSConv is illustrated in the Fig 6. Taking the 3-channel input feature map in the example as reference, DWConv first applies an independent convolution kernel to each single input channel, and extracts spatial features separately for each channel without cross-channel feature interaction. Then, PWConv adopts 1×1 convolution kernels to process the output feature maps from DWConv, which completes the fusion of cross-channel features and adjusts the number of output channels. Compared with standard convolution, DSConv significantly reduces computational complexity and parameter count, which is more suitable for lightweight model deployment on resource-constrained edge devices.

Assuming that the size of the input feature map is and the size of the convolution kernel is , the total amount of calculation for N convolutions is ; the total amount of DWConv calculation is ; the total amount of PWConv is ; the total amount of calculation of DSConv is . Calculating the ratio of the computational load of DSConv to standard convolution, and the result is shown in Eq (3),

(3)

In general, N is large, and can be ignored. Dk represents the size of the convolution kernel. The parameters and calculations of DSConv are reduced to the original , so as to obtain faster speed, easier to transplant, and can achieve high-precision operations on smaller devices.

Lightweight network model

It can be seen that the A2C2f module based on the regional attention mechanism can focus more on the target body for the characteristics of slender, curved and low resolution of the submarine pipeline target. Its deformable convolution can adjust the sampling point along the pipeline direction, which can improve the low detection accuracy of the submarine pipeline target and the problem of missed detection. Therefore, the original C3K2 module in the YOLO11-seg network model is replaced by the A2C2f module. Fig 7 shows the improved lightweight network model structure of YOLO v11-seg network model.

thumbnail
Fig 7. The improved lightweight network model structure of YOLO11-seg network model.

https://doi.org/10.1371/journal.pone.0346343.g007

It can be seen from Fig 7 that the C3K2 module focuses on feature parallelism. By replacing the C3K2 module with the A2C2f module, its multi-convolution kernel is used to capture multi-scale, and the A2 is embedded in the middle layer to focus on the key area earlier. Aiming at the characteristics of SSS submarine pipeline target slender and blurred edges, the detection accuracy is improved and the missed detection rate is reduced. In the occlusion and low contrast scene, the target feature is extracted by regional attention priority to improve the integrity of segmentation. At the same time, the DSConv module is introduced, designed to achieve a more lightweight architecture by reducing both the parameter count and computational complexity. It greatly reduces the computational cost while retaining the global receptive field.

The P3 layer is a shallow network, and the receptive field is relatively small. It generally contains more location and detail information. At the end of this stage, the A2C2f module is added to process the multi-scale feature weight distribution relationship and enhance the transmission of edge and texture features. At the same time, by weighting each channel, the attention to the useful detail feature channel is increased, which is helpful for the recognition of the submarine pipeline target by the small target detection head after the P3 layer feature fusion.

The P4 layer is used as the middle layer network. At the end of this stage, the A2C2f module is introduced. At the same time, the seabed image is divided into four regions, and the attention calculation is performed in each region. The receptive field of the large target covers multiple regions. The localization of the regional attention has little effect on its global semantic information. It can pay more attention to the spatial location of the submarine pipeline target, suppress the background features of the unimportant seabed image, reduce the propagation of invalid features, improve the positioning accuracy, and reduce the calculation amount by 75% while completing the high-quality feature aggregation and avoiding the problem of feature redundancy.

Real-time detection method process

The end-to-end workflow of the proposed lightweight model for real-time SSS pipeline and cable target detection is shown in Fig 8. The whole workflow includes four core steps: image preprocessing, dataset splitting, model iterative training, model prediction and performance evaluation.

thumbnail
Fig 8. The network model detection method flow, the network iterative training, get the model prediction and evaluation results.

https://doi.org/10.1371/journal.pone.0346343.g008

Image preprocessing is the foundational step for model training and inference. This step is performed on all SSS images before network input. The preprocessing pipeline is designed specifically for the characteristics of SSS submarine pipeline images, with detailed operations as follows,

All input SSS images are uniformly resized to a fixed resolution of 640 × 640 pixels. This resolution matches the input dimension requirement of the improved YOLO11n-seg model. The original aspect ratio of each image is maintained during resizing. Zero-padding is applied to fill blank areas, which avoids shape deformation of the slender linear pipeline and cable targets. SSS images inherently contain strong speckle noise caused by acoustic wave scattering in seawater. Gaussian filtering is adopted for image smoothing. This operation effectively suppresses background noise, while retaining the edge details and linear structural features of pipeline and cable targets.

After the completion of image preprocessing, the preprocessed original images and the augmented images generated by data enhancement are integrated. The complete dataset is then randomly divided into training set (70%), validation set (20%) and test set (10%) according to a fixed ratio of 7:2:1. No data overlap exists between the three sets, and the distribution of target features remains consistent across different sets. The training set is used for model weight learning. The validation set is applied to monitor overfitting risk during the training process. The test set is adopted for the final unbiased performance evaluation of the model.

Experimental evaluation

Dataset

The Marine-PULSE dataset is the first public SSS dataset for marine engineering geology in the Bohai Sea in China. It was created by Du Xing [66] and his colleagues in the field of marine geological disasters. It contains various common SSS seabed image data that appear in the Yellow River estuary. It is used to automatically identify marine engineering structures and common seabed landforms. It covers four typical targets: submarine pipelines (323), seabed residual deposits (134), seabed surface (88), and engineering platform legs (82). The data source includes a variety of advanced sonar equipment such as EdgeTech4200FS, BenthosSIS-1624, etc., thus ensuring the diversity and reliability of the data. The subject selects the SSS submarine pipeline image of the submarine pipeline (323), a total of 323 pictures, and the size of each picture is 640 × 640 pixels.

The actual scene of submarine pipelines and cables is complex and changeable. The factors such as the shape and position of pipelines and cables, lighting conditions, water quality and so on will affect the image presentation. It is difficult to include all possible situations only by the original data set. Data enhancement will perform operations such as rotation, flipping, scaling, and adding noise to the original image, creating more images with different perspectives, lighting conditions, and background interference to improve data diversity. The multi-type data augmentation strategy adopted in this study effectively alleviates the overfitting problem caused by limited original labeled samples, and improves the generalization performance of the model.The 323 underwater acoustic images with annotations in the data set are enhanced to 1615. Let the model learn more features and improve the generalization ability and robustness of the convolutional neural network.

In the training submarine pipeline target detection task, the data set is divided into Train, Val and Test according to the ratio of 7: 2: 1, which can reasonably evaluate the performance of the model, understand the status of each period of the model, quickly find out the problem and correct it.

Ablation experiment

In this study, the A2C2f module was used to replace the C3K2 module, and the DSConv module was used to replace the Conv module. To directly evaluate the impact of the proposed modules on the SSS pipeline detection model, we conducted ablation experiments. In the original YOLO11n-seg, the C3K2 module is replaced by the A2C2f module, and the Conv module is replaced by the DSConv module. After the data is enhanced, the Marine-PULSE dataset is trained and compared. The ablation experiment result can be given in Table 1.

From Table 1, it can be seen that the first group was the baseline model YOLO11n-seg; in the second group, the A2C2f module is introduced, and the accuracy of the detection frame is improved by 3.0%, which effectively reduces the false detection. Both mAP@50 and mAP@50-95 are improved, which verifies the optimization of high IoU detection accuracy. In terms of segmentation performance, both mAP@50 and mAP@50-95 are improved, the segmentation fineness is significantly enhanced, the capture of edges and details is better, and the number of parameters is slightly increased. A2C2f optimizes the feature expression at a minimal cost, which is more suitable for scenes with high detection accuracy requirements. The third group introduced the DSConv module, the recall rate increased by 3.4%, significantly improved, reflecting the ability of less missed detection, the accuracy rate decreased, indicating that the false detection increased, in the segmentation performance, the recall rate increased, mAP@50-95 decreased slightly, indicating that more real targets can be detected, but the high fineness was slightly sacrificed, the number of parameters decreased significantly, reflecting the DSCconv is efficient convolution, greatly reducing the computational cost. The fourth group introduces the A2C2f module and the DSConv module, and adopts the model for this study. On the bounding box, the recall rate is increased by 9.7%, and the missed detection rate is significantly reduced, which is the most obvious improvement among the three groups. Both mAP@50 and mAP@50-95 are improved to maintain high positioning accuracy. In terms of segmentation performance, the recall rate and detection accuracy are greatly improved, and the segmentation accuracy and integrity are improved simultaneously. The amount of parameters and calculations are reduced to achieve lightweight.

A2C2f focuses on accuracy, and DSConv focuses on efficiency. After the fusion of the two, the optimal trade-off is achieved in the scenario of missed detection sensitive and real-time operation. This study uses the model to achieve the improvement of detection accuracy and the leap of recall rate at a lower computational cost. For the SSS submarine pipeline target, a better balance between model complexity and detection accuracy is achieved, which can meet the real-time detection requirements of SSS equipment.

Gradient ablation experiments are conducted to verify the dependence of the proposed model’s performance on the size of training data. Under the premise of fixed validation set and test set, the original labeled images in the training set are randomly sampled at ratios of 10%, 30%, 50%, 70% and 100% for model training. All experimental groups adopt the same data augmentation strategy and training hyperparameters to ensure the uniqueness of control variables. The results show that the model’s detection and segmentation performance improves steadily with the increase of training samples, and it maintains stable and excellent performance under small training sample conditions. When the training sample size reaches 50% of the original training set, the model achieves more than 95% of the performance under the full training set, which is close to the optimal performance level. Even when only 30% of the original training samples are used, the model still retains more than 90% of the full training set performance for both bounding box detection and instance segmentation tasks. These results confirm that the proposed method has low dependence on the size of the labeled dataset. It has strong adaptability and engineering application value for practical underwater inspection scenarios where high-quality labeled sonar images are difficult to obtain in large quantities.

Loss function

The total loss of the proposed model consists of two core parts: the detection branch loss and the instance segmentation branch loss. The final total loss is calculated as the weighted sum of each sub-loss term, as shown in Eq (4),

(4)

In Eq (4), Ldet represents the detection branch loss, Lmask represents the instance segmentation mask loss. and are the weight coefficients of each loss term, which are set to 1.0 and 0.5 respectively in this study. The weight settings are consistent with the default hyperparameters of the YOLO11 framework, to achieve balanced optimization of detection and segmentation tasks.

The detection branch loss Ldet is composed of three sub-loss terms, and its calculation formula is shown in Eq (5),

(5)

The detection branch loss Ldet consists of three sub-terms: the classification loss Lcls, bounding box regression loss Lbox, and objectness loss Lobj. Among them, Lcls and Lobj are both calculated by Binary Cross-Entropy (BCE) loss, where Lcls measures the error between the predicted category probability and the ground truth label to optimize the classification accuracy of pipeline and cable targets, while Lobj evaluates the confidence of target existence in the predicted anchor box to reduce the false detection rate in complex seabed backgrounds; Lbox is computed with Complete Intersection over Union (CIoU) loss, which comprehensively considers the overlap area, center point distance and aspect ratio between the predicted bounding box and the ground truth box, to improve the positioning accuracy of the target bounding box.

The instance segmentation branch loss Lmask uses pixel-level Binary Cross-Entropy loss. It calculates the classification error of each pixel between the predicted mask and the ground truth mask. This loss term optimizes the contour segmentation integrity of pipeline and cable targets, especially for bending, fractured and partially sediment-covered targets.

Contrast experiment

To comprehensively evaluate the performance of our proposed model, we conducted comparative experiments with other networks, and maintained the comparison under the Marine-PULSE data set and 260 training rounds after data enhancement. The contrast experiment result can be given in Table 2.

From Table 2, it can be seen that the YOLO11n-seg + A2C2f + DSConv model proposed in this study has improved mAP@50 compared with other models in terms of detection frame and segmentation performance.mAP@50-95 is the same as YOLO13n-seg, surpassing YOLO11n-seg and YOLO12n-seg. At the same time, the number of parameters is reduced by 15%, and GFLOP is reduced by 6%. Fig 9 is the comparison between the average accuracy of different models and the number of parameters. The lightweight range is far more than other models, and the deployment cost is lower, which is more suitable for edge devices and real-time scenarios.

thumbnail
Fig 9. The radius of the circle represents GFLOPs by comparing the mAP@50 and Parameters of different models.

https://doi.org/10.1371/journal.pone.0346343.g009

The processing time of each method can be given in Table 3.

thumbnail
Table 3. Comparison of the average Preprocessing, Inference, and Postprocess time per image for model evaluation of the validation set for different models.

https://doi.org/10.1371/journal.pone.0346343.t003

From Table 3, it can be seen that the model proposed in this study has a significant improvement in the number of frames per second FPS (Frames Per Second). With its superior real-time processing capability, the model is particularly suitable for applications that require high-speed operation, thus effectively supporting real-time detection on edge devices.

Overfitting analysis and robustness evaluation

Overfitting is a common challenge for deep learning models, especially in tasks with limited labeled samples. This study adopts multiple targeted strategies to mitigate overfitting and fully verifies the model’s generalization ability through quantitative experiments: a strict non-overlapping dataset division and real-time training monitoring mechanism are established, and the synchronous stable convergence of training and validation loss curves confirms no serious overfitting occurs during training; multi-type data augmentation expands the original 323 labeled images to 1615 effective training samples, enriching data diversity to improve generalization performance; the lightweight design reduces model parameters by 15% compared with the baseline YOLO11n-seg, lowering fitting complexity and overfitting risk. Gradient dataset ablation experiments further prove that the model maintains over 95% of full-dataset performance with only 30% of original labeled samples, showing excellent anti-overfitting ability and generalization in small-sample scenarios.

Robustness is a critical index for the practical engineering application of models in complex underwater environments. The proposed method exhibits strong robustness in multiple dimensions, which is fully verified by module design and quantitative experiments: the A2C2f module with regional attention mechanism effectively suppresses seabed background noise interference, achieving 9.7% and 10.3% improvements in bounding box and instance segmentation recall respectively compared with the baseline model; the model maintains excellent detection and segmentation performance for pipeline targets with bending, fracture or partial sediment coverage; it also shows stable performance under different training set scales, with smaller performance degradation than baseline models when training samples are reduced. Meanwhile, the model achieves an end-to-end processing speed of 70 FPS, maintaining stable real-time performance on resource-constrained AUV edge devices to meet the requirements of actual underwater inspection tasks.

Conclusion

In this study, aiming at the difficulties of SSS submarine pipeline target detection, based on the depth feature extraction technology, the YOLO11n-seg is improved by using A2C2f module and DSConv module to realize the autonomous detection of SSS submarine pipeline target.Compared to YOLO11n-seg, our model shows consistent improvements: segmentation mAP@50 and bounding box mAP@50-95 improved by 3.6% and 1.6%, respectively. Meanwhile, the recall rates for detection bounding boxes and instance segmentation improved by 9.7% and 10.3%, respectively. At the same time, the number of parameters is reduced by 15%, and the deployment cost is lower, which is more suitable for edge devices and real-time scenarios. It has significant engineering application value in real-time SSS submarine pipeline and cable target processing. It provides a feasible and reliable technical solution for marine survey, pipeline and cable target feature extraction and other tasks, and has a good development prospect and promotion.

There are still many aspects to be improved in this research model. For example, using style transfer to increase high-level features, further enriching data content, how to improve the module in real-time detection scenarios, so that it can further improve the detection accuracy and accuracy while maintaining lightweight will be the next research direction.

Acknowledgments

The author sincerely appreciates the pioneering work done by Dr. Du and his colleagues in the field of marine geological disasters in creating the Marine-PULSE dataset.

References

  1. 1. Zhang D, Zhang Y, Zhao B, Ma Y, Si K. Exploring subsea dynamics: a comprehensive review of underwater pipelines and cables. Physics of Fluids. 2024;36(10).
  2. 2. Khan A, Ali SSA, Anwer A, Adil SH, Meriaudeau F. Subsea pipeline corrosion estimation by restoring and enhancing degraded underwater images. IEEE Access. 2018;6:40585–601.
  3. 3. Zhao X, Wang X, Du Z. Research on detection method for the leakage of underwater pipeline by YOLOv3. In: 2020 IEEE International Conference on Mechatronics and Automation (ICMA). 2020. p. 637–42.
  4. 4. Kartal SK, Cantekin RF. Autonomous underwater pipe damage detection positioning and pipe line tracking experiment with unmanned underwater vehicle. JMSE. 2024;12(11):2002.
  5. 5. Hong X, Huang L, Gong S, Xiao G. Shedding damage detection of metal underwater pipeline external anticorrosive coating by ultrasonic imaging based on HOG + SVM. JMSE. 2021;9(4):364.
  6. 6. Zhang X, Tan C, Ying W. An imaging algorithm for multireceiver synthetic aperture sonar. Remote Sensing. 2019;11(6):672.
  7. 7. Tan C, Zhang X, Yang P, Sun M. A novel sub-bottom profiler and signal processor. Sensors (Basel). 2019;19(22):5052. pmid:31752419
  8. 8. Zhang X, Yang P, Dai X. Focusing multireceiver SAS data based on the fourth-order legendre expansion. Circuits Syst Signal Process. 2018;38(6):2607–29.
  9. 9. Zhang X, Dai X, Yang B. Fast imaging algorithm for the multiple receiver synthetic aperture sonars. IET Radar Sonar & Navi. 2018;12(11):1276–84.
  10. 10. Zhang X, Yang P, Sun H. Frequency‐domain multireceiver synthetic aperture sonar imagery with Chebyshev polynomials. Electronics Letters. 2022;58(25):995–8.
  11. 11. Zhang X, Yang P, Wang J, Zhu J. Focus improvement of multireceiver SAS based on range-doppler algorithm. IEEE Trans Instrum Meas. 2026;75:1–14.
  12. 12. Zhang X, Yang P. Back projection algorithm for multi-receiver synthetic aperture sonar based on two interpolators. JMSE. 2022;10(6):718.
  13. 13. Zhang X, Yang P, Feng X, Sun H. Efficient imaging method for multireceiver SAS. IET Radar Sonar & Navi. 2022;16(9):1470–83.
  14. 14. Zhang X, Yang P, Huang P, Sun H, Ying W. Wide‐bandwidth signal‐based multireceiver SAS imagery using extended chirp scaling algorithm. IET Radar Sonar & Navi. 2021;16(3):531–41.
  15. 15. Zhang X. An efficient method for the simulation of multireceiver SAS raw signal. Multimed Tools Appl. 2023;83(13):37351–68.
  16. 16. Zerr B, Mailfert G, Bertholom A, Ayreault H. Sidescan sonar image processing for AUV navigation. In: Europe Oceans 2005. 2005. p. 124-130 Vol. 1. https://doi.org/10.1109/oceanse.2005.1511696
  17. 17. Zhang J, Xie Y, Ling L, Folkesson J. A dense subframe-based SLAM framework with side-scan sonar. 2023. https://arxiv.org/abs/2312.13802
  18. 18. Davy CM, Fenton MB. Technical note: side-scan sonar enables rapid detection of aquatic reptiles in turbid lotic systems. Eur J Wildl Res. 2012;59(1):123–7.
  19. 19. Bryant R. Side scan sonar for hydrography-an evaluation by the Canadian hydrographic service. The International Hydrographic Review. 1975.
  20. 20. Kumagai H, Tsukioka S, Yamamoto H, Tsuji T, Shitashima K, Asada M. Hydrothermal plumes imaged by high-resolution side-scan sonar on a cruising AUV, Urashima. Geochemistry, Geophysics, Geosystems. 2010;11(12).
  21. 21. Zhang H, Zhang S, Wang Y, Liu Y, Yang Y, Zhou T, et al. Subsea pipeline leak inspection by autonomous underwater vehicle. Applied Ocean Research. 2021;107:102321.
  22. 22. Rumson AG. The application of fully unmanned robotic systems for inspection of subsea pipelines. Ocean Engineering. 2021;235:109214.
  23. 23. Feng H, Yu J, Huang Y, Cui J, Qiao J, Wang Z, et al. Automatic tracking method for submarine cables and pipelines of AUV based on side scan sonar. Ocean Engineering. 2023;280:114689.
  24. 24. Tosello E, Bonel P, Buranello A, Carraro M, Cimatti A, Granelli L, et al. Opportunistic (Re)planning for long-term deep-ocean inspection: an autonomous underwater architecture. IEEE Robot Automat Mag. 2024;31(1):72–83.
  25. 25. Zhu B, Wang X, Chu Z, Yang Y, Shi J. Active learning for recognition of shipwreck target in side-scan sonar image. Remote Sensing. 2019;11(3):243.
  26. 26. Feng D, Harakeh A, Waslander SL, Dietmayer K. A review and comparative study on probabilistic object detection in autonomous driving. IEEE Trans Intell Transport Syst. 2022;23(8):9961–80.
  27. 27. Meus B, Kryjak T, Gorgon M. Embedded vision system for pedestrian detection based on HOG+SVM and use of motion information implemented in Zynq heterogeneous device. In: 2017 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA). 2017. p. 406–11. https://doi.org/10.23919/spa.2017.8166901
  28. 28. Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A. The Pascal Visual Object Classes (VOC) Challenge. Int J Comput Vis. 2009;88(2):303–38.
  29. 29. Jin L, Gao S, Li Z, Tang J. Hand-crafted features or machine learnt features? Together they improve RGB-D object recognition. In: 2014 IEEE International Symposium on Multimedia. 2014. p. 311–9. https://doi.org/10.1109/ism.2014.56
  30. 30. Hua W, Chen Q. A survey of small object detection based on deep learning in aerial images. Artif Intell Rev. 2025;58(6).
  31. 31. Nisa U. Image augmentation approaches for small and tiny object detection in aerial images: a review. Multimed Tools Appl. 2024;84(19):21521–68.
  32. 32. Li L, Li Y, Yue C, Xu G, Wang H, Feng X. Real-time underwater target detection for AUV using side scan sonar images based on deep learning. Applied Ocean Research. 2023;138:103630.
  33. 33. Chen C, Zhong J, Tan Y. Multiple-oriented and small object detection with convolutional neural networks for aerial image. Remote Sensing. 2019;11(18):2176.
  34. 34. Zhang X, Han L, Han L, Zhu L. How well do deep learning-based methods for land cover classification and object detection perform on high resolution remote sensing imagery?. Remote Sensing. 2020;12(3):417.
  35. 35. van Noord N, Postma E. Learning scale-variant and scale-invariant features for deep image classification. Pattern Recognition. 2017;61:583–92.
  36. 36. Ma W, Wu Y, Cen F, Wang G. MDFN: multi-scale deep feature learning network for object detection. Pattern Recognition. 2020;100:107149.
  37. 37. Stowell D, Plumbley MD. Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning. PeerJ. 2014;2:e488. pmid:25083350
  38. 38. Greenhalgh J, Mirmehdi M. Real-time detection and recognition of road traffic signs. IEEE Trans Intell Transport Syst. 2012;13(4):1498–506.
  39. 39. Yao J, Qi J, Zhang J, Shao H, Yang J, Li X. A real-time detection algorithm for kiwifruit defects based on YOLOv5. Electronics. 2021;10(14):1711.
  40. 40. Lv B, Wu L, Huangfu T, He J, Chen W, Tan L. Traditional chinese medicine recognition based on target detection. Evid Based Complement Alternat Med. 2022;2022:9220443. pmid:35845589
  41. 41. Solunke BR, Gengaje SR. A review on traditional and deep learning based object detection methods. In: 2023 International Conference on Emerging Smart Computing and Informatics (ESCI). 2023. p. 1–7. https://doi.org/10.1109/esci56872.2023.10099639
  42. 42. Ma W, Wu Y, Cen F, Wang G. MDFN: multi-scale deep feature learning network for object detection. Pattern Recognition. 2020;100:107149.
  43. 43. Tan H, Zheng L, Ma C, Xu Y, Sun Y. Deep learning-assisted high-resolution sonar detection of local damage in underwater structures. Automation in Construction. 2024;164:105479.
  44. 44. Dong B, Wang X. Comparison deep learning method to traditional methods using for network intrusion detection. In: 2016 8th IEEE International Conference on Communication Software and Networks (ICCSN). 2016. p. 581–5. https://doi.org/10.1109/iccsn.2016.7586590
  45. 45. Munir M, Chattha MA, Dengel A, Ahmed S. A comparative analysis of traditional and deep learning-based anomaly detection methods for streaming data. In: 2019 18th IEEE International Conference On Machine Learning and Applications (ICMLA), 2019. p. 561–6. https://doi.org/10.1109/icmla.2019.00105
  46. 46. Karimanzira D, Renkewitz H, Shea D, Albiez J. Object detection in sonar images. Electronics. 2020;9(7):1180.
  47. 47. Sandoval C, Pirogova E, Lech M. Two-stage deep learning approach to the classification of fine-art paintings. IEEE Access. 2019;7:41770–81.
  48. 48. Suárez-Paniagua V, Rivera Zavala RM, Segura-Bedmar I, Martínez P. A two-stage deep learning approach for extracting entities and relationships from medical texts. J Biomed Inform. 2019;99:103285. pmid:31546016
  49. 49. Khan FA, Gumaei A, Derhab A, Hussain A. TSDL: a two-stage deep learning model for efficient network intrusion detection. IEEE Access. 2019;7:30373–85.
  50. 50. Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014. p. 580–7. https://doi.org/10.1109/cvpr.2014.81
  51. 51. Ren S, He K, Girshick R, Sun J. Faster r-cnn: towards real-time object detection with region proposal networks. Advances in neural information processing systems. 2015;28.
  52. 52. Bi H, Wen V, Xu Z. Comparing one-stage and two-stage learning strategy in object detection. ACE. 2023;5(1):171–7.
  53. 53. Wang T, Yang F, Tsui K-L. Real-time detection of railway track component via one-stage deep learning networks. Sensors (Basel). 2020;20(15):4325. pmid:32756365
  54. 54. El-Saadawy H, Tantawi M, Shedeed HA, Tolba MF. One-stage vs two-stage deep learning method for bone abnormality detection. Advances in Intelligent Systems and Computing. Springer International Publishing; 2021. p. 122–32. https://doi.org/10.1007/978-3-030-76346-6_12
  55. 55. Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016. p. 779–88. https://doi.org/10.1109/cvpr.2016.91
  56. 56. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, et al. SSD: single shot multibox detector. Lecture Notes in Computer Science. Springer International Publishing; 2016. p. 21–37. https://doi.org/10.1007/978-3-319-46448-0_2
  57. 57. Lin T-Y, Goyal P, Girshick R, He K, Dollar P. Focal loss for dense object detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), 2017. p. 2999–3007. https://doi.org/10.1109/iccv.2017.324
  58. 58. Zhang H, Cloutier RS. Review on one-stage object detection based on deep learning. EAI Endorsed Transactions on E-Learning. 2022;7(23).
  59. 59. Jiang P, Ergu D, Liu F, Cai Y, Ma B. A review of yolo algorithm developments. Procedia Computer Science. 2022;199:1066–73.
  60. 60. Terven J, Córdova-Esparza D-M, Romero-González J-A. A Comprehensive review of YOLO architectures in computer vision: from YOLOv1 to YOLOv8 and YOLO-NAS. MAKE. 2023;5(4):1680–716.
  61. 61. Chen C, Zheng Z, Xu T, Guo S, Feng S, Yao W, et al. YOLO-based UAV technology: a review of the research and its applications. Drones. 2023;7(3):190.
  62. 62. Hussain M. YOLOv1 to v8: unveiling each variant–a comprehensive review of YOLO. IEEE Access. 2024;12:42816–33.
  63. 63. Bharati P, Pramanik A. Deep learning techniques—R-CNN to mask R-CNN: a survey. Advances in Intelligent Systems and Computing. Springer Singapore; 2019. p. 657–68. https://doi.org/10.1007/978-981-13-9042-5_56
  64. 64. Chai J, Zeng H, Li A, Ngai EWT. Deep learning in computer vision: a critical review of emerging techniques and application scenarios. Machine Learning with Applications. 2021;6:100134.
  65. 65. Hongtao W, Xi Y. Object detection method based on improved one-stage detector. In: 2020 5th International Conference on Smart Grid and Electrical Automation (ICSGEA). 2020. p. 209–12. https://doi.org/10.1109/icsgea51094.2020.00051
  66. 66. Einsidler D, Dhanak M, Beaujean PP. Oceans 2018 Mts/Ieee Charleston, 2018. p. 1–4.
  67. 67. Du X, Sun Y, Song Y, Dong L, Zhao X. Revealing the potential of deep learning for detecting submarine pipelines in side-scan sonar images: an investigation of pre-training datasets. Remote Sensing. 2023;15(19):4873.
  68. 68. Li L, Li Y, Wang H, Yue C, Gao P, Wang Y, et al. Side-scan sonar image generation under zero and few samples for underwater target detection. Remote Sensing. 2024;16(22):4134.
  69. 69. Zheng Y, Yan J, Meng J, Liang M. A small-sample target detection method of side-scan sonar based on CycleGAN and improved YOLOv8. Applied Sciences. 2025;15(5):2396.
  70. 70. Zheng G, Zhao J, Li S, Feng J. Zero-shot pipeline detection for sub-bottom profiler data based on imaging principles. Remote Sensing. 2021;13(21):4401.
  71. 71. Fu S, Xu F, Liu J, Pang Y, Yang J. Underwater small object detection in side-scan sonar images based on improved YOLOv5. In: 2022 3rd International Conference on Geology, Mapping and Remote Sensing (ICGMRS), 2022. p. 446–53.
  72. 72. Cheng C, Wang C, Yang D, Wen X, Liu W, Zhang F. Underwater small target detection based on dynamic convolution and attention mechanism. Front Mar Sci. 2024;11.
  73. 73. Wang J, Wang Q, Gao G, Qin P, He B. Improving Yolo5 for real-time detection of small targets in side scan sonar images. J Ocean Univ China. 2023;22(6):1551–62.
  74. 74. Zhou X, Tian K, Zhou Z. STGAN: sonar image despeckling method utilizing GAN and transformer. In: International Symposium on Artificial Intelligence and Robotics. Springer; 2023. p. 56–67.
  75. 75. Lee B, Ku B, Kim W, Kim S, Ko H. Feature sparse coding with CoordConv for side scan sonar image enhancement. IEEE Geosci Remote Sensing Lett. 2022;19:1–5.
  76. 76. Liu Y, Wu Y, Li G, Abbas A, Shi T. Submarine cable detection using an end-to-end neural network-based magnetic data inversion. Journal of Geophysics and Engineering. 2024;21(3):884–96.
  77. 77. Duan B, Wang S, Luo C, Chen Z. Multi-module fusion model for submarine pipeline identification based on YOLOv5. JMSE. 2024;12(3):451.
  78. 78. Li Y, Wu M, Guo J, Huang Y. A strategy of subsea pipeline identification with sidescan sonar based on YOLOV5 Model. In: 2021 21st International Conference on Control, Automation and Systems (ICCAS), 2021. p. 500–5. https://doi.org/10.23919/iccas52745.2021.9649828
  79. 79. Elmezain M, Saad Saoud L, Sultan A, Heshmat M, Seneviratne L, Hussain I. Advancing underwater vision: a survey of deep learning models for underwater object recognition and tracking. IEEE Access. 2025;13:17830–67.
  80. 80. Chitty-Venkata KT, Mittal S, Emani M, Vishwanath V, Somani AK. A survey of techniques for optimizing transformer inference. Journal of Systems Architecture. 2023;144:102990.
  81. 81. Yang N, Li G, Wang S, Wei Z, Ren H, Zhang X, et al. SS-YOLO: a lightweight deep learning model focused on side-scan sonar target detection. JMSE. 2025;13(1):66.
  82. 82. Zheng G, Zhao J, Li S, Feng J. Zero-shot pipeline detection for sub-bottom profiler data based on imaging principles. Remote Sensing. 2021;13(21):4401.
  83. 83. Dakhil RA, Khayeat ARH. Review on deep learning technique for underwater object detection. arXiv preprint. 2022. https://arxiv.org/abs/2209.10151
  84. 84. Zhang X, Yang P. Imaging algorithm for multireceiver synthetic aperture sonar. J Electr Eng Technol. 2019;14(1):471–8.
  85. 85. Zhang X, Yang P. An improved imaging algorithm for multi-receiver SAS system with wide-bandwidth signal. Remote Sensing. 2021;13(24):5008.
  86. 86. Zhang X, Yang P, Sun H. An omega‐ k algorithm for multireceiver synthetic aperture sonar. Electronics Letters. 2023;59(13).
  87. 87. Xu D, Xu Z, Lin L, Zheng J, Song L, Ji Z. JMUCOD: joint maritime-underwater cross-domain object detection algorithm. IEEE J Sel Top Appl Earth Observations Remote Sensing. 2026;19:564–79.
  88. 88. Xu Z, Xu D, Lin L, Song L, Song D, Sun Y, et al. Integrated object detection and communication for synthetic aperture radar images. IEEE J Sel Top Appl Earth Observations Remote Sensing. 2025;18:294–307.
  89. 89. Su J, Xu D, Qiu L, Xu Z, Lin L, Zheng J. A high-accuracy underwater object detection algorithm for synthetic aperture sonar images. Remote Sensing. 2025;17(13):2112.
  90. 90. Xu D, He Y, Su J, Qiu L, Lin L, Zheng J, et al. An ultra-lightweight and high-precision underwater object detection algorithm for SAS images. Remote Sensing. 2025;17(17):3027.
  91. 91. Sun S, Xu Z, Cao X, Zheng J, Yang J, Jin N. A high-performance and lightweight maritime target detection algorithm. Remote Sensing. 2025;17(6):1012.
  92. 92. Xu Z, Xu D, He Y, Lin L, Zheng J. A hybrid filtering method with no-reference quality assessment for synthetic aperture sonar images. PLoS One. 2025;20(11):e0332458. pmid:41252426
  93. 93. Zheng J, Zhao S, Xu Z, Zhang L, Liu J. Anchor boxes adaptive optimization algorithm for maritime object detection in video surveillance. Front Mar Sci. 2023;10.