Two-stage ship detection at long distances based on deep learning and slicing technique

Yanfeng Gong; Zihao Chen; Jiawan Tan; Chaozhong Yin; Wen Deng

doi:10.1371/journal.pone.0313145

Abstract

Ship detection over long distances is crucial for the visual perception of intelligent ships. Since traditional image processing-based methods are not robust, deep learning-based image recognition methods can automatically obtain the features of small ships. However, due to the limited pixels of ships over long distances, accurate features of such ships are difficult to obtain. To address this, a two-stage object detection method that combines the advantages of traditional and deep-learning methods is proposed. In the first stage, an object detection model for the sea-sky line (SSL) region is trained to select a potential region of ships. In the second stage, another object detection model for ships is trained using sliced patches containing ships. When testing, the SSL region is first detected using the trained 8th version of You Only Look Once (YOLOv8). Then, the SSL region detected is divided into several overlapping patches using the slicing technique, and another trained YOLOv8 is applied to detect ships. The experimental results showed that our method achieved 85% average precision when the intersection over union is 0.5 (AP₅₀), and a detection speed of 75 ms per image with a pixel size of 1080×640. The code is available at https://github.com/gongyanfeng/PaperCode.

Citation: Gong Y, Chen Z, Tan J, Yin C, Deng W (2024) Two-stage ship detection at long distances based on deep learning and slicing technique. PLoS ONE 19(11): e0313145. https://doi.org/10.1371/journal.pone.0313145

Editor: Qian Zhang, Jiangsu Open University, CHINA

Received: June 26, 2024; Accepted: October 19, 2024; Published: November 19, 2024

Copyright: © 2024 Gong et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Supplementary Materials: The code is available at https://github.com/gongyanfeng/PaperCode and the ship dataset is available at https://github.com/gongyanfeng/dataset.

Funding: This research was funded by the Chongqing Jiaotong University Graduate Research Innovation, grant No. 2023S0075. “The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript”.

Competing interests: The authors have declared that no competing interests exist.

1. Introduction

With the development of artificial intelligence, the intelligent ship has attracted more and more attention. It can reduce transportation costs and increase navigation security. Visual perception is the eye of intelligent ships. To guarantee the safety of intelligent ships, precise ship detection is essential. The farther away a ship can be detected, the safer the intelligent ship will be. Nearby ships can be detected easily, but when the ship is farther away, the detection results become worse. Especially when the ship is at long distances, it is difficult to obtain accurate features due to limited pixels. In this paper, we focus on detecting ships at long distances at sea.

According to the way images are obtained, ship detection methods are usually categorized into three categories: visual images, radar images, and remote sensing images. Radar is the instrument with the longest distance for ship detection. These are widely deployed aboard ships. However, radar images have limited features, making them challenging to recognize using machine vision. While methods based on remote sensing images are always time-consuming. Visual images have rich features and are the most widely used in intelligent ship technology [1]. Ship detection methods in visual images are usually divided into two types: traditional and deep-learning methods. Ships at long distances in marine environments often appear near the SSL. Thus, the traditional method usually consists of four steps: preprocessing, SSL detection, ROI extraction, and recognition [2]. Deep-learning methods are usually based on convolutional neural networks (CNN). Unlike general object detection in terrestrial environments, ship detection is performed in complex water surface backgrounds that are affected by seabirds, light, wind, water waves, and other factors. This presents significant challenges.

In the early stages, ship detection focused on traditional image-processing methods [3–6]. These methods usually fetch manual pre-defined features to find ships in images. The handcrafted features vary with the distance of the ship, the type of ship, and the weather, making them not robust in applications. Since the images with SSL background have a boundary between the sky and the ocean, and ships at long distances usually appear near SSL, most ship detection methods for these images detect the SSL first and then recognize the surrounding ships [7–9]. In recent years, deep learning has achieved great success in object detection and has attracted increasing attention in ship detection [10–14]. Compared with traditional methods, deep-learning-based methods can automatically obtain features and are more robust. However, there are several difficulties in applying deep-learning-based methods to intelligent ship technology. First, when the ships are at long distances, they appear very small in images and have a similar size to seabirds, clouds, and some water waves. Additionally, they have a similar feature to close-range buoys, making the detection results susceptible to interference from these deceptive objects. Second, the ships at long distances are in low contrast in bad weather, such as fog and rain. Finally, the long-distance ship detection using deep learning lacks sufficient sample data to train the model.

Though some studies have attempted to reduce the impact of seabirds, clouds, and fog [8–9], these methods are not robust enough and have limited effectiveness. Some other works focus on the small ship detection [15–17], most of them are used for remote sensing images, which are from a top-down perspective, unlike the frontal perspective of our long-distance ship images. Besides, to improve the accuracy of small targets, some models have been proposed [18–20]. These methods will be used for comparison with ours in later experiments.

To sum up, the robustness of traditional image-processing methods is poor, and the accuracy of deep-learning methods needs to be improved.

To address ship detection at long distances in marine environments, we proposed a two-stage method that combines the advantages of traditional and deep-learning methods to make full use of the SSL and deep-learning methods. Specifically, we first detected the SSL region in a high-resolution image using a lightweight and accurate object detection model because the SSL is an obvious feature of these images, which are captured over long distances. Second, YOLOv8 combined with the slicing technique was utilized to detect ships in the resulting SSL region, where ship features did not need to be compressed in the input images.

The main contributions of this study are as follows.

We constructed a long-distance ship dataset, which is applicable for the visual perception of intelligent ships (available at https://drive.google.com/drive/folders/1WKjZYdarcy4PYKcjg0idqQDWStrPLvUv?usp=sharing).
A novel ship detection method at long distances was proposed. The proposed method divides the detection of ship targets into two steps: SSL detection and ship detection, greatly improving the ship detection accuracy while ensuring detection speed.
Ship detection combined with slicing technique was proposed, greatly improving the small ship detection accuracy.
Our method is suitable for high-resolution image capture devices, which can obtain high-quality images of ships at long distances, and is more conducive to visual perception of intelligent ships.

In this paper, Section 2 introduces the related work that is relevant to our method. Section 3 presents the proposed method. Section 4 details the experiments and their corresponding results. Section 5 outlines the conclusions.

2. Related works

2.1. Ship detection

Before deep learning became popular, ship detection at long distances focused on traditional image-processing methods. Zhang et al. [21] detected the SSL using a discrete cosine transform first, then completed background modeling and background subtraction for ship detection. Liang et al. [9] and Lin et al. [6] utilized gradient features to detect SSL for ship detection to prevent the influence of local noise; however, they had to choose the appropriate handcrafted features. Occasionally, impacted by different weather conditions, ship images at sea may be in low contrast, and the accuracy of SSL and ship detection is poor in this condition. Shan et al. [7] used SSL and saliency detection to alleviate this effect of the weather. Except for the weather, the detection results are easily affected by sea clutter. A frequency-domain method was adopted for ROI extraction when the ships were in such scenarios [5]. This method alleviates this influence. This is not applicable when the sea clutter is strong. Li et al. [4] used prior knowledge, such as aspect ratio, contrast ratio, ship size, and grayscale, to identify ships around the SSL and improve detection accuracy. However, these features are handcrafted and are not robust enough. Therefore, adopting traditional methods for complex scenes is challenging and requires selecting appropriate handcrafted features, which exhibit poor robustness.

With the development of neural networks, deep-learning-based methods have attracted increasing attention in ship detection. At present, there are two kinds of deep learning-based object detection methods: (a) two-stage detectors, such as SPP-Net [22], fast R-CNN [23], and faster R-CNN [24]; (b) one-stage detectors, such as You Only Look Once (YOLO) series [25–31], SSD [32], RetinaNet [33], and CenterNet [34]. Marie et al. [35] described a fast R-CNN-based maritime object-detection method that achieved satisfactory results. This method divided the detection of ship targets into three steps, but they did not consider SSL. Qi et al. [36] proposed an improved faster R-CNN algorithm for ship detection that significantly shortened the detection time while improving the detection accuracy. However, these methods are time-consuming. To improve the detection speed and accuracy simultaneously, many researchers have proposed variants based on object detection models. Shan et al. [37] proposed a novel method called SiamFPN for ship detection at sea. Wang et al. [38] presented an improved YOLOv3 algorithm that realizes an end-to-end ship target detection system, making the application of deep-learning methods in ship detection feasible. Lee et al. [39] proposed an improved one-stage model that realizes floating object detection, including ships at sea, at 30 frames per second. This method did not take into account the ship’s detection at long distances and the influence of weather. To reduce the influence of the weather, Liu et al. [40] proposed an enhanced CNN, improving ship detection. Wang et al. [11] constructed a real-time ship target detection method based on YOLOv4, improving the detection speed and accuracy. They also did not consider ship detection at long distances. Zhang et al. [41] and Qin et al. [42] presented improved methods based on YOLOv3. Their accuracy was satisfactory for sea-surface ships of general sizes. Li et al. [43] proposed a ship detection and recognition method based on a multilevel hybrid network combining traditional image processing and deep-learning methods to improve detection speed. Xu et al. [44] proposed LMO-YOLO to address the high false detection rate of ships in low-resolution images. Liu and Zhu [45] proposed a residual YOLOX-based ship object detection model, that is applicable to ship image detection in ports. This method focuses on the ship detection of optical satellite images, where images do not have SSL. Zhou et al. [13] present an improved YOLOv5 model for ship detection, greatly increasing the detection speed. Wang et al. [31] proposed YOLOv10 model based on the previous version of YOLO. It further improved the performance of YOLO. These deep-learning-based methods have achieved good results for ship detection. However, they are not applicable to ship targets over long distances at sea, which is a small-target detection problem.

Long-distance ship target detection is a subcategory of generic ship target detection and has attracted increasing attention. Wei et al. [46] presented a small-target detection method based on hierarchical and multi-scale convolutional neural networks aiming at detecting maritime targets in complex scenarios. The detection speed needs to be improved. Chen et al. [47] proposed a novel hybrid deep-learning method that combined a modified generative adversarial network (GAN) and a convolutional neural network (CNN)-based detection approach for small-ship detection. This method can alleviate the overfitting caused by insufficient sample data. However, it is not suitable for ship detection at long distances. Nina et al. [48] compared YOLO and YOLT in detecting small ship objects. These results indicate that the accuracy of both YOLO and YOLT for small ship objects must be improved. Yu et al. [49] present a modified YOLOv3 model for small-scale ship target detection, improving both recall rate and accuracy rate. Hu et al. [50] proposed a novel small-ship detection method based on YOLOv4 to solve the problem in which global and local relationships in the input image are rarely considered. Escorcia-Gutierrez et al. [51] presented an efficient optimal mask-CNN technique for small-ship detection using autonomous shipping technologies and obtained satisfactory results. Similarly, these methods only considered small target detection caused by the small size of the ship itself and did not consider the scenario when the ship is far away. Recently, Wang et al. [52] and Qu et al. [53] combined the YOLOX algorithm with a convolutional block attention module (CBAM) to improve the accuracy of small ship targets. Experiments showed that CBAM can fetch better features for small ships. Sun et al. [16] proposed a novel model for small object detection. However, this method focused on the small ship detection from the bird’s eye view, which is not suitable for the visual perception scenario of intelligent ships. Guo et al. [17] constructed a multi-scale key point-based detector to improve tiny object detection. These methods can obtain more fine-grained features. However, to detect ships at longer distances, high-resolution cameras are usually used, acquiring images with a pixel size much larger than 640×640. This means that images fed into the model must be compressed, losing many valuable features.

2.2. YOLOv8 model

Among the deep-learning methods, the YOLO series have attracted more and more attention. These methods are one-stage CNN-based detectors. Compared to two-stage detectors, they have a better trade-off between accuracy and detection speed. As of now, the latest version of YOLO is YOLOv10 [31]. But when we started this work, the most popular YOLO was YOLOv8 [30]. YOLO [25] is the first version of the YOLO series. It directly maps each feature pixel of the grid cell to bounding boxes and class probabilities that consider the global information of the input image, resulting in a faster detection speed. In addition, YOLO sets a precedent for anchor-free object detection. As the superiority of the anchor boxes strategy in two-stage detectors, Redmon and Farhadi proposed YOLOv2 [26], which adopted the anchor boxes strategy again and introduced Multi-Scale Training, improving the accuracy of YOLO. To further improve its performance, Redmon and Farhadi [27] proposed YOLOv3 by introducing ResNet, FPN, Adam optimizer, and independent logistic classifiers. Based on YOLOv3, YOLOv4 [28] adopts mosaic and cut mix data augmentation, CSPNet [54], CIOU [55], and label smoothing. In the later improved versions, YOLOv5 [29] and YOLOv8 [30] were used most frequently. YOLOv5 introduces an adaptive anchor box and neighborhood positive and negative sample allocation strategies based on YOLOv4, accelerating convergence. Compared with YOLOv5, YOLOv8 has a decoupled head and is anchor-free. Additionally, it replaces CSPNet with a C2f structure and adopts a distribution function loss with a richer gradient flow.

YOLOv8 consists of three parts: backbone, neck, and head. The backbone was used to extract features, and the C2f structure and SPPF were adopted. The multiscale features were fused using the neck. Decoupled structures are utilized by the head for classification and bounding-box regression.

The biggest highlight of YOLOv8 is the use of the C2f structure, which is improved based on the idea of CSPNet. The C2f structure is shown in Fig 1, the gradients on the right-hand side were separately integrated. In contrast, the feature map on the left side was integrated separately. Neither side contained duplicate gradient information. Thus, it preserves the advantages of the feature reuse characteristics on the right side and simultaneously prevents excessive duplicate gradient information by truncating the gradient flow. The C2f structure has the following three advantages: 1) it can enhance the study ability of the model by simultaneously making the model lightweight and improving its accuracy, 2) it can reduce the computation bottleneck, and 3) it can reduce the memory costs.

Download:

Fig 1. Structure of C2f used in YOLOv8.

https://doi.org/10.1371/journal.pone.0313145.g001

3. Proposed method

In this study, we proposed a two-stage YOLOv8 for long-distance ship detection at sea. The architecture of the proposed two-stage YOLOv8 is shown in Fig 2. It consists of two stages: the selection of the SSL region and ship detection. Additionally, we constructed a ship dataset suitable for the visual perception of intelligent ships.

Download:

Fig 2. Overview of the proposed long-distance ship detection framework.

https://doi.org/10.1371/journal.pone.0313145.g002

3.1. Ship dataset

There are some public datasets of visual images for ship detection [56–58], but they are not from the perspective of long-distance, where SSL is not present in all images. And there is no unified public dataset for long-distance ship target detection. We collected a ship dataset containing 1871 ship images. When capturing pictures, the target ships were far away from the optical camera. This is applicable in the context of intelligent ship technology, where two ships are at a considerable distance from each other. As a result, the ship in each captured image is positioned close to the SSL. As shown in Fig 3, the ship appears very small because of its long distance from the camera, and it is near the SSL. Over long distances, seabirds and clouds can easily be mistaken for ships. The parts circled in the ellipses are seabirds, as shown in Fig 4a. In Fig 4b, the areas circled in the ellipses are clouds. All have features similar to those of ships over long distances. Additionally, waves formed on the sea surface by strong winds are highly similar to distant ships. Samples with these characteristics were collected to demonstrate the superiority of our method. The dataset was divided into two parts, the training set and the validation set, with a ratio of 9:1. At the same time, the validation set was also used as the test set. The details of the dataset are listed in Table 1.

Download:

Fig 3. Presentation of ship samples in our dataset.

https://doi.org/10.1371/journal.pone.0313145.g003

Download:

Fig 4. Examples of difficult-to-identify samples.

(a) True ship targets are circled by rectangles, and false ship targets circled by ellipses are seabirds; (b) true ship targets are circled by rectangles, and false ship targets circled by ellipses are clouds.

https://doi.org/10.1371/journal.pone.0313145.g004

Download:

Table 1. Presentation of our ship dataset.

https://doi.org/10.1371/journal.pone.0313145.t001

Fig 5 illustrates the distribution of ships in the images. There were 4348 ship instances in 1871 image samples. The number of ships in each image ranged from 1 to 14. Most images contain only one ship instance. The number of image samples was 958, as shown in the first cylindrical cluster in Fig 5. For the last cylinder cluster in Fig 5, there are four images, each of which contains 14 ship instances, for a total of 56 ship instances. Fig 6 shows the statistics of the ship bounding boxes in the dataset, including the coordinates and size distribution. According to the definition of small targets in the MS COCO dataset, targets smaller than 32 × 32 pixels (1024 pixels) are called small targets. In our dataset, 21.6% of the objects were smaller than 150, and 85.4% were smaller than 400. Most instances have a width less than 0.03 of the image width and a height less than 0.01 of the image height, as shown in Fig 7. All instances were evenly distributed throughout the images. The average size of the instances in our dataset was 229 pixels, and the smallest size was only four pixels. Thus, it is more challenging than a typical small-target detection problem.

Download:

Fig 5. Illustration of ship targets in each image sample.

For example, for the second cylinder cluster, there are 421 images, each of which contains two ship targets, a total containing 842 ship targets.

https://doi.org/10.1371/journal.pone.0313145.g005

Download:

Fig 6. Statistics of bounding box areas in pixels.

https://doi.org/10.1371/journal.pone.0313145.g006

Download:

Fig 7. Coordinate and size statistics of ship bounding box.

https://doi.org/10.1371/journal.pone.0313145.g007

3.2. SSL region selection model with YOLOv8

Ships at long distances appear small in the image, usually just taking up several pixels of the whole image, and always lying near the SSL. In contrast, the SSL region is conspicuous, spans the image from left to right, and makes it easily detectable. Most object detection models can detect SSL from images accurately, whereas long-distance ships are difficult to detect. Thus, if the SSL region is selected correctly, a ship can be detected using a suitable method.

Because of the excellent characteristics of YOLOv8, which has good speed and accuracy, we adopted it to detect the SSL region in our first stage.

The loss in the SSL region selection model consists of two parts: rectangular box loss and classification loss, where rectangular box loss includes CIOU loss and DFL loss. The total loss is defined as (1) where CIOU and DFL are the rectangular box loss and loss_cls is the classification loss. The a, b, and c are the weight coefficients of each loss. In general, a is 7.5, b is 1.5, and c is 0.5.

BCELoss with sigmoid was adopted to compute loss_cls, which is denoted as (2) where p_i is the probability of the i-th ship predicted, and y_i is the label of the i-th ship. N represents the total number of ship samples.

The CIOU loss and DFL loss are given by (3) (4) where S_i and S_i+1 denote the prediction probability of the left integer point and the right integer point of the corresponding anchor point respectively. As shown in Fig 8, IOU is the intersection ratio of two bounding boxes, ρ is the distance between two bounding boxes’ center points, c is the distance between the left-up point and right-bottom point of the bounding rectangle of two bounding boxes, v is the ratio of width and height of two bounding boxes, α is the coefficient of v. The four parameters are defined as (5) where (x_p,y_p) and (x_t,y_t) are the center points of the predicted and true bounding boxes, respectively. (x_p1,y_p1) and (x_p2,y_p2) are the bottom-left and right points of the predicted bounding box, respectively. (x_t1,y_t1) and (x_t2,y_t2) are the upper left and bottom right points of the true bounding box, respectively. S₁ and S₂ are the intersection and union areas of the two bounding boxes, respectively.

Download:

Fig 8. Description of CIOU.

https://doi.org/10.1371/journal.pone.0313145.g008

The SSL generally spans the entire width of the image, with the upper part being the sky and the lower part being the ocean. Because of these distinctive features, we usually resized the input image to 128 × 128 pixels. It achieved almost the same detection accuracy as when using an input size of 640 × 640 pixels, while significantly improving the detection speed. This setting was verified in subsequent experiments.

An apparent advantage of detecting the SSL region is that it eliminates the negative influence of clouds and seabirds in the sky, as well as the influence of waves on the sea when detecting ships near the SSL.

3.3. Ship detection with YOLOv8 and slicing technique

After detecting the SSL region, a suitable method must be selected to detect ships in the region. The width of the SSL region is significantly greater than its height. Existing object detection models usually require a specified size for the input images and are not suitable for the SSL region. In the YOLO series, an image is resized to the same height and width as the input to the model. For YOLOv2, the input size is 448 ×448 pixels. For YOLOv5 and YOLOv8, the input size was 640 × 640. For the faster R-CNN, although images are required to be resized to a specified ratio to input the model, the SSL region with a large aspect ratio is not suitable for training a model, which could cause serious distortion of the ship target. In this study, we adopted the slicing technique. Using this technique, an image is sliced into several patches. All patches of an image are input into the ship object detection models, and the detection results are combined into one in the original image.

The slicing technique is illustrated in Fig 9. The width of the SSL region is the same as that of the original input image, whereas its height is usually between a few and dozens of pixels, according to the size of the ships in it. Thus, we cropped the SSL region into patches whose heights were the same as those of the SSL region, and whose widths were fixed at 128 pixels. These patches are then input into an object detection model. After obtaining all patch results of an SSL region, we combined these results into one, denoting the final detection result of an input image. There is an obvious advantage to using the slicing technique. Ships in the SSL region comprise several pixels in the original high-resolution image, which is typically much larger than 640 × 640 pixels. After an image was resized to 640 × 640 pixels before being input to the model, the ships in it became smaller, making it more difficult to detect. However, using the slicing technique, the original small ship objects do not need to be resized to a smaller size. Each patch had the same resolution as the SSL region. When these patches were input into an object detection model, they were resized to 128 × 128 pixels using interpolation. This made the original small ship object easier to detect. As shown in Fig 10a, a ship over long distances occupies only 25 × 5 pixels in a high-resolution image of size 1080 × 640. If the current methods are used, they are resized to 13 × 5 pixels before inputting the object detection model. While the size 25×5 is preserved before inputting the object detection model in our method, as shown in Fig 10b. Not only are the features of small ships retained but the influence of background is greatly reduced. In this study, we chose YOLOv8 as the object detection model because of its excellent small-object detection accuracy and good detection speed.

Download:

Fig 9. Description of slicing technique.

https://doi.org/10.1371/journal.pone.0313145.g009

Download:

Fig 10. Comparison of current methods and our proposed method used in object detection.

(a) Current methods and (b) proposed method.

https://doi.org/10.1371/journal.pone.0313145.g010

After all the patches of an SSL region were detected, their detection results were combined into one displayed in the SSL region. Subsequently, based on the location of the SSL region in the original image, the ships detected in the SSL region continue to be located in the original image, as shown in Fig 2.

4. Experiments

In this section, we present extensive experiments conducted to demonstrate the superiority of the proposed method. First, the experimental environment and setup are introduced. Subsequently, the metrics are presented. Finally, two types of experiments are presented: methods with and without our two-stage strategy, and comparisons with state-of-the-art models.

4.1. Experimental environment and setup

All experiments were conducted using a PyTorch framework on a computer with a 64-bit Ubuntu-16.04 operating system. The GPU used was an NVIDIA GeForce RTX 3090ti with an Intel Core i7-10700 CPU. The Python version was 3.9.18, the Torch version was 1.8.0, and the CUDA version 11.1. The specific configurations are listed in Table 2. This is because the proposed method comprises two models with different settings. In the first stage, we adopted YOLOv8 with a batch size of 8. The input size is 128 × 128 pixels. In the second stage, we adopted YOLOv8 as the ship detection model. The input images were patches with a batch size of 8 and an input size of 128 × 128. Other hyperparameters, including the learning rates and weight decay, in these two stages were set by default, which were the same as the original settings in YOLOv8 [30].

Download:

Table 2. Configuration of experimental environments.

https://doi.org/10.1371/journal.pone.0313145.t002

4.2. Evaluation criterion

Several state-of-the-art methods have been used to conduct comparative experiments and demonstrate the superiority of the proposed method. Therefore, we have selected an evaluation criterion. Accuracy and speed are the two main metrics used in object detection methods.

Accuracy is measured using the average precision (AP). AP is defined as the area under the precision-recall curve. The precision and recall are expressed as (6) (7) where TP is the number of bounding boxes whose predicted labels are the same as the ground truth, FP is the number of bounding boxes whose predicted labels differ from the ground truth, and FN is the number of bounding boxes where the ground truth is not predicted.

For multiclass object detection, the mean value of all the AP is calculated and is called the mean average precision (mAP). In this study, mAP specifically refers to the AP of a single class. The formulas for the AP are as follows: (8) where P(R) is the precision predicted at the recall R. The AP values ranged from 0 to 1. Different IOU has different AP, where IOU ranges from 0.5 to 0.95 with step 0.05. As most instances in our dataset are extremely small, a higher IOU can lead to poor detection results. We chose IOU 0.5 when computing the AP, which is written as AP₅₀.

In addition to measuring the AP, we measured the speed of the proposed method. In this study, we computed the inference time for images in one batch size to obtain the time required to handle one image.

4.3. Experiments with and without our two-stage strategy

In this experiment, we evaluated the performance of three typical networks: Faster R-CNN, YOLOv5, and YOLOv8. We applied it to three networks to demonstrate the effectiveness of our two-stage strategy. This indicated that YOLOv8 was first used to detect the SSL region. Subsequently, Faster R-CNN, YOLOv5, and YOLOv8 with the slicing technique were used to detect ships in the SSL region.

In the first stage, the size of the input image is set to 128 × 128 pixels. As shown in Table 3, when the input size of images is set to 640 × 640 pixels, the AP₅₀ and AP_50:95 for SSL using our method are 0.99 and 0.66, respectively. Although the input size is set to 128 × 128 pixels, the AP₅₀ and AP_50:95 become 0.967 and 0.658, which are slightly lower than that of 640 × 640. However, the detection speed has been doubled, from 1.6 ms to 0.8 ms. Although the ship becomes almost invisible in the resized images of 128 × 128 pixels, the SSL region, which always spans the entire image and serves as the boundary between the sky and sea, had little impact on its features. This led to a significant improvement in detection speed compared to an input size of 640 × 640. Fig 11 shows the visualization of the SSL detection results under various conditions. When the weather is overcast, the SSL becomes a little blurry but can be correctly detected using YOLOv8, as shown in Fig 11a–11c. When the light is dim and there are clouds near the SSL, the features of SSL become less obvious, which increases the difficulty of SSL detection. Under this condition, YOLOv8 can also detect the SSL correctly, as shown in Fig 11d and 11e. If the glow appears when the light above the sea is dim, the features of the SSL will be slightly more obvious, as shown in Fig 11f and 11g. In addition to clouds, waves and seabirds can affect the ship detection. Since YOLOv8 could detect SSL correctly, as shown in Fig 11i, seabirds outside the SSL region can be filtered out, thereby improving the accuracy of ship detection.

Download:

Fig 11. Visualization of the SSL detection results under various conditions.

(a) overcast; (b) overcast and foggy; (c) overcast and existing waves; (d)-(e) nightfall and cloudy; (f)-(g) morning and existing morning glow; (h) cloudy and existing waves; (i) existing seabirds.

https://doi.org/10.1371/journal.pone.0313145.g011

Download:

Table 3. Comparison for SSL detection of two different input size.

https://doi.org/10.1371/journal.pone.0313145.t003

In the second stage, we detected the ship in the SSL region using the Faster R-CNN, YOLOv5, and YOLOv8 with the slicing technique, to demonstrate the effectiveness of our two-stage strategy. During this stage, the input size of the image patches is set to 128 × 128 pixels. The results are summarized in Table 4. The Faster R-CNN with our two-stage strategy achieved better results than the original model, with a 0.041 improvement in AP_50:95 and a 0.122 improvement in AP₅₀. For the YOLO series models, YOLOv5 combined with our proposed strategy significantly outperformed the baseline YOLOv5, with increases of 0.31 and 0.411 in terms of AP_50:95 and AP₅₀, respectively. When combined with YOLOv8, AP_50:95 and AP₅₀ performed better, reaching 48% and 85% accuracy, respectively. The ship features were retained in the original high-resolution images by inputting image patches into the detection model instead of using entire images. In the original models, the ship features were also compressed when the high-resolution images were resized to 640 × 640 pixels. Thus, detection accuracy was significantly improved using our two-stage strategy. The detection time changes from one image to multiple patches because a high-resolution SSL region is divided into several patches, increasing the total detection time. As shown in Table 4, the detection time was approximately six times the original or even more when adopting our two-stage strategy. For the two-stage YOLOv8, the detection speed is 75 ms per image, which has little impact on the visual perception of intelligent ships where ship detection is carried out at a certain interval. Additionally, a terminal carrying a high-performance graphics card can perform real-time detection under these conditions. These experimental results demonstrate the effectiveness of the proposed two-stage strategy.

Download:

Table 4. Experimental results with and without the two-stage strategy.

https://doi.org/10.1371/journal.pone.0313145.t004

4.4. Comparison with the state-of-the-art

We compared it with several state-of-the-art object detection models, such as TPH-YOLOv5, YOLO-Fastestv2, YOLOv8, and YOLOv10, to further prove the superiority of the proposed model. The results are summarized in Table 5. Because YOLOv8 makes some improvements based on YOLOv5, making the network lightweight and accurate, it exceeds 0.411 in AP₅₀ compared with YOLOv5. In addition, YOLOv8 clearly outperforms Faster R-CNN, CenterNet, and SSD, except for TPH-YOLOv5, YOLO-Fastestv2, and YOLOv10. However, TPH-YOLOv5 and YOLO-Fastestv2 had longer inference times than YOLOv8. Due to the high real-time performance of YOLOv8, with an inference time of just 2.1 ms per image, we selected it as the SSL detector in the first and second stages. Our proposed method is tested using two different settings: two-stage YOLOv8-a and two-stage YOLOv8-b, with input sizes of 640+128 and 128+128, respectively.

Download:

Table 5. The detection results of the proposed method and other state-of-the-art models.

https://doi.org/10.1371/journal.pone.0313145.t005

In terms of AP₅₀, two-stage YOLOv8-a and two-stage YOLOv8-b achieved 87% and 85% accuracy respectively, exceeding that of YOLOv8 by 0.111 and 0.131 respectively. In addition, our method is more accurate than other state-of-the-art models. Specifically, in terms of AP₅₀, the proposed two-stage YOLOv8-b exceeded YOLOv10 by 0.1. Besides, it exceeded TPH-YOLOv5 and YOLO-Fastestv2 by 0.078 and 0.04, respectively. Its accuracy is much higher than those of the Faster R-CNN, YOLOv5, SSD, and CenterNet, as shown in Fig 12. This demonstrated the superiority of the proposed method. The inference time per image of our two-stage YOLOv8-b is 75 ms, which is better than YOLO-Fastestv2 but inferior to all other models. However, it is worth noting the significant improvements in accuracy. Furthermore, for a device with a powerful GPU, the proposed two-stage YOLOv8 can be operated in real-time. The 75-ms inference time had little impact. Therefore, our method achieved a better tradeoff between accuracy and speed, making it more suitable for long-distance ship detection at sea.

Download:

Fig 12. Comparisons of comprehensive performance on our long-distance ship dataset.

https://doi.org/10.1371/journal.pone.0313145.g012

Since recall also needs to be considered, and YOLOv8, YOLO-Fastestv2, and YOLOv10 are superior to other compared methods, we compared these three methods with our method simultaneously in terms of precision and recall. The results are shown in Fig 13. In the second stage of our method, the AP₅₀ of ships in sliced patches is 0.984. Considering the results after ship relocation in the SSL region, the AP₅₀ of ships is 0.879. In the first stage of our method, the AP₅₀ of the SSL region is 0.967. Thus, the joint AP₅₀ of a ship in the original images is 0.85 (= 0.967×0.879). Our method outperforms YOLOv8, YOLO-Fastestv2, and YOLOv10 in terms of precision-recall curve, demonstrating superior performance.

Download:

Fig 13. Precision-recall curves of proposed methods, YOLO-Fastestv2, and YOLOv8.

(a) YOLOv8; (b) YOLO-Fastestv2; (c) the first stage of our method with an input size of 128; (d) the second stage of our method; (e) YOLOv10.

https://doi.org/10.1371/journal.pone.0313145.g013

However, YOLOv8, YOLO-Fastestv2, and YOLOv10 had better accuracy than Faster R-CNN, YOLOv5, SSD, CenterNet, and TPH-YOLOv5; thus, we visualized the detection results of YOLOv8, YOLO-Fastestv2, YOLOv10, and our methods. This visualization is illustrated in Fig 14. It can be seen that our method can rightly detect a long-distance ship with few pixels. As shown in Fig 14a, the ship target has a pixel size of 17 × 3, and YOLOv8 and YOLOv10 cannot detect the ship target, whereas YOLO-Fastestv2 and the proposed method can. In Fig 14b, YOLOv8 and YOLOv10 detected only three larger targets, whereas YOLO-Fastestv2 and our method detected all but one incorrect target. In Fig 14c, because the size of the left target is only 8 × 2 pixels, YOLO-Fastestv2 only detects the larger target, and YOLOv8 and YOLOv10 ignore both targets. Under the influence of seabirds, YOLOv8 misdetects the seabird as a ship and ignores its first target. YOLO-Fastestv2 and YOLOv10 did not detect the first target. However, our method correctly detected all the targets, as shown in Fig 14d. A large portion of seabirds in the sky did not appear in the ship detection of the second stage, owing to the first stage of filtration of the SSL region detection. Therefore, the proposed method can reduce seabird misidentification. In addition, the features of small targets were preserved because we adopted a slicing technique. However, for YOLOv8, YOLOv10, and YOLO-Fastest, many features of the small targets in the original high-resolution images disappeared after the images were resized to 640 × 640 pixels. This demonstrates the superiority of the proposed method in preserving the features of long-distance ships.

Download:

Fig 14. Visualization of the detection results of our methods and YOLOv8.

(a) there is one ship with a pixel size of 17 × 3; (b) there are six ships; (c) there are two ships, the smallest of which is only 8 × 2 pixels; (d) there are five ships, with many seabirds, are around the SSL.

https://doi.org/10.1371/journal.pone.0313145.g014

5. Conclusions

Precise and reliable ship detection methods are crucial for the visual perception of intelligent ships. When a distant ship is located far away from the intelligent vessel, it may appear significantly smaller in the visual image, posing challenges for detection with current object detection models. In this study, we combined the advantages of traditional methods and deep-learning-based methods and proposed a novel method to realize ship detection over long distances. Based on the popular object detection model, YOLOv8, and the slicing technique, we designed a two-stage method. We adopted YOLOv8, which has excellent performance, to detect the SSL region containing potential ship objects in the first stage. The second stage used another YOLOv8 and a slicing technique to detect a ship in the SSL region, which was obtained in the first stage. We constructed a ship dataset with 1871 images captured using a high-definition camera from a distant perspective at sea. Using the two-stage structure, the accuracies of Faster R-CNN, YOLOv5, and YOLOv8 improved significantly, although the detection speed degraded to some extent. For a GPU device with a good performance, the inference time loss is negligible. This demonstrates the effectiveness of the proposed two-stage strategy. In addition, our method outperformed other state-of-the-art models in terms of accuracy and exhibited a good trade-off between accuracy and speed. This makes deep-learning methods reliable when applied to the visual perception of intelligent ships.

However, our approach may also fail in several extreme cases, such as rain, objects near the SSL, and extremely small ships with the same background as the sky. As shown in Fig 15a, when the ship to be detected is in the rain, it appears very blurry. In this case, it is not even possible to see with the eyes, let alone with machine vision. In addition, if there are other objects near the SSL, ships may be misdetected. Because these objects have the same features as real ships. When some ships have the same background as the sky, they blend into the background and may not be detected, as shown in Fig 15b. Thus, we should install the camera in a place with a wide field of view to avoid objects on the ship appearing in the view. In addition, try to avoid using automatic ship detection on rainy days. In future studies, we will further improve the detection speed of this method for deployment on embedded devices. At the same time, to solve the above failure cases, we will try to further improve the accuracy of the method by using multiple sensors.

Download:

Fig 15. Failure cases of our methods.

(a) ships are in the rain; (b) other objects exist near the SSL and some ships have the same background as the sky.

https://doi.org/10.1371/journal.pone.0313145.g015

Acknowledgments

Thanks for the Elsevier Language Editing Services, which make our work more regular. And the authors wish to thank the editors and reviewers for their valuable suggestions.

References

1. Shao Z.; Wang L.; Wang Z.; Du W.; Wu W. Saliency-aware convolution neural network for ship detection in surveillance video. IEEE Transactions on Circuits and Systems for Video Technology 2020, 30, 781–794. http://dx.doi.org/10.1109/TCSVT.2019.2897980.
- View Article
- Google Scholar
2. Wang L.; Fan S.; Liu Y.; Li Y.; Fei C.; Liu J.; et al. A review of methods for ship detection with electro-optical images in marine environments. In Journal of Marine Science and Engineering, 2021; Vol. 9. http://dx.doi.org/10.3390/jmse9121408.
- View Article
- Google Scholar
3. Pradhan C.; Gupta A. Ship detection using neyman-pearson criterion in marine environment. Ocean Engineering 2017, 143, 106–112. http://dx.doi.org/10.1016/j.oceaneng.2017.03.008.
- View Article
- Google Scholar
4. Li Y.; Li Z.; Zhu Y.; Li B.; Xiong W.; Huang Y. Thermal infrared small ship detection in sea clutter based on morphological reconstruction and multi-feature analysis. In Applied Sciences, 2019; Vol. 9. http://dx.doi.org/10.3390/app9183786.
- View Article
- Google Scholar
5. Zhou A.; Xie W.; Pei J. Infrared maritime target detection using the high order statistic filtering in fractional fourier domain. Infrared Physics & Technology 2018, 91, 123–136. http://dx.doi.org/10.1016/j.infrared.2018.04.006.
- View Article
- Google Scholar
6. Lin C.; Chen W.; Zhou H. Multi-visual feature saliency detection for sea-surface targets through improved sea-sky-line detection. In Journal of Marine Science and Engineering, 2020; Vol. 8. http://dx.doi.org/10.3390/jmse8100799.
- View Article
- Google Scholar
7. Shan X.; Zhao D.; Pan M.; Wang D.; Zhao L. Sea–sky line and its nearby ships detection based on the motion attitude of visible light sensors. In Sensors, 2019; Vol. 19. http://dx.doi.org/10.3390/s19184004. pmid:31527524
- View Article
- PubMed/NCBI
- Google Scholar
8. Yang W.; Li H.; Liu J.; Xie S.; Luo J. A sea-sky-line detection method based on gaussian mixture models and image texture features. International Journal of Advanced Robotic Systems 2019, 16, 1729881419892116. http://dx.doi.org/10.1177/1729881419892116.
- View Article
- Google Scholar
9. Liang D.; Liang Y. Horizon detection from electro-optical sensors under maritime environment. IEEE Transactions on Instrumentation and Measurement 2020, 69, 45–53. http://dx.doi.org/10.1109/TIM.2019.2893008.
- View Article
- Google Scholar
10. Fefilatyev S.; Goldgof D.; Shreve M.; Lembke C. Detection and tracking of ships in open sea with rapidly moving buoy-mounted camera system. Ocean Engineering 2012, 54, 1–12. http://dx.doi.org/10.1016/j.oceaneng.2012.06.028.
- View Article
- Google Scholar
11. Wang, B.; Han, B.; Yang, L. In Accurate real-time ship target detection using yolov4, 2021 6th International Conference on Transportation Information and Safety (ICTIS), 22–24 Oct. 2021, 2021; pp 222–227. http://dx.doi.org/10.1109/ICTIS54573.2021.9798495.
12. Kong M.-C.; Roh M.-I.; Kim K.-S.; Lee J.; Kim J.; Lee G. Object detection method for ship safety plans using deep learning. Ocean Engineering 2022, 246, 110587. http://dx.doi.org/10.1016/j.oceaneng.2022.110587.
- View Article
- Google Scholar
13. Zhou, S.; Yin, J. In Yolo-ship: A visible light ship detection method, 2022 2nd International Conference on Consumer Electronics and Computer Engineering (ICCECE), 14–16 Jan. 2022, 2022; pp 113–118. http://dx.doi.org/10.1109/ICCECE54139.2022.9712768.
14. Cheng S.; Zhu Y.; Wu S. Deep learning based efficient ship detection from drone-captured images for maritime surveillance. Ocean Engineering 2023, 285, 115440. http://dx.doi.org/10.1016/j.oceaneng.2023.115440.
- View Article
- Google Scholar
15. Liu Z.; Zhang W.; Yu H.; Zhou S.; Qi W.; Guo Y.; et al. Improved yolov5s for small ship detection with optical remote sensing images. IEEE Geoscience and Remote Sensing Letters 2023, 20, 1–5. http://dx.doi.org/10.1109/LGRS.2023.3319025.
- View Article
- Google Scholar
16. Sun Y.; Su L.; Yuan S.; Meng H. Danet: Dual-branch activation network for small object instance segmentation of ship images. IEEE Transactions on Circuits and Systems for Video Technology 2023, 33, 6708–6720. http://dx.doi.org/10.1109/TCSVT.2023.3267127.
- View Article
- Google Scholar
17. Guo G.; Chen P.; Yu X.; Han Z.; Ye Q.; Gao S. Save the tiny, save the all: Hierarchical activation network for tiny object detection. IEEE Transactions on Circuits and Systems for Video Technology 2024, 34, 221–234. http://dx.doi.org/10.1109/TCSVT.2023.3284161.
- View Article
- Google Scholar
18. Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. In Tph-yolov5: Improved yolov5 based on transformer prediction head for object detection on drone-captured scenarios, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 11–17 Oct. 2021, 2021; pp 2778–2788. http://dx.doi.org/10.1109/ICCVW54120.2021.00312.
19. dog-qiuqiu. dog-qiuqiu/Yolo-FastestV2: V0.2. Zenodo, http://dx.doi.org/10.5281/zenodo.5181503, 2021.
20. Akyon, F.C.; Altinuc, S.O.; Temizel, A. In Slicing aided hyper inference and fine-tuning for small object detection, 2022 IEEE International Conference on Image Processing (ICIP), 16–19 Oct. 2022, 2022; pp 966–970. http://dx.doi.org/10.1109/ICIP46576.2022.9897990.
21. Zhang Y.; Li Q.-Z.; Zang F.-N. Ship detection for visual maritime surveillance from non-stationary platforms. Ocean Engineering 2017, 141, 53–63. http://dx.doi.org/10.1016/j.oceaneng.2017.06.022.
- View Article
- Google Scholar
22. He K.; Zhang X.; Ren S.; Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 2015, 37, 1904–1916. http://dx.doi.org/10.1109/TPAMI.2015.2389824. pmid:26353135
- View Article
- PubMed/NCBI
- Google Scholar
23. Girshick, R. In Fast r-cnn, 2015 IEEE International Conference on Computer Vision (ICCV), 7–13 Dec. 2015, 2015; pp 1440–1448. http://dx.doi.org/10.1109/ICCV.2015.169.
24. Ren S.; He K.; Girshick R.; Sun J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 2017, 39, 1137–1149. http://dx.doi.org/10.1109/TPAMI.2016.2577031. pmid:27295650
- View Article
- PubMed/NCBI
- Google Scholar
25. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. In You only look once: Unified, real-time object detection, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27–30 June 2016, 2016; pp 779–788. http://dx.doi.org/10.1109/CVPR.2016.91.
26. Redmon, J.; Farhadi, A. In Yolo9000: Better, faster, stronger, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 21–26 July 2017, 2017; pp 6517–6525. http://dx.doi.org/10.1109/CVPR.2017.690.
27. Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv preprint 2018. http://dx.doi.org/10.48550/arXiv.1804.02767.
28. Bochkovskiy, A.; Wang, C.; Liao, H. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint 2020. http://dx.doi.org/10.48550/arXiv.2004.10934.
29. Jocher, G.; Chaurasia, A.; Stoken A.; Borovec J.; NanoCode012; Kwon Y. et al. ultralytics/yolov5: v6.1—TensorRT, TensorFlow Edge TPU and OpenVINO Export and Inference. Zenodo, 2022. http://dx.doi.org/10.5281/zenodo.6222936.
30. Jocher, G.; Chaurasia, A.; Qiu, J. YOLO by Ultralytics (Version 8.0.0) [Computer software], https://github.com/ultralytics/ultralytics, 2023.
31. Wang, A.; Chen, H.; Liu, L.; Chen, K. & Lin, Z. YOLOv10: real-time end-to-end object detection. arXiv preprint arXiv:2405.14458v1 2024. https://arxiv.org/pdf/2405.14458.
32. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; et al. In Ssd: Single shot multibox detector, Computer Vision–ECCV 2016, Cham, 2016//, 2016; Leibe, B.; Matas, J.; Sebe, N.; Welling, M., Eds. Springer International Publishing: Cham, pp 21–37. http://dx.doi.org/10.1007/978-3-319-46448-0_2.
33. Lin T.Y.; Goyal P.; Girshick R.; He K.; Dollár P. Focal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 2020, 42, 318–327. http://dx.doi.org/10.1109/TPAMI.2018.2858826. pmid:30040631
- View Article
- PubMed/NCBI
- Google Scholar
34. Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. Centernet: Keypoint triplets for object detection. 2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019, 6568–6577. http://dx.doi.org/10.48550/arXiv.1904.08189.
35. Marié, V.; Béchar, I.; Bouchara, F. In Real-time maritime situation awareness based on deep learning with dynamic anchors, 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 27–30 Nov. 2018, 2018; pp 1–6. http://dx.doi.org/10.1109/AVSS.2018.8639373.
36. Qi L.; Li B.; Chen L.; Wang W.; Dong L.; Jia X.; et al. Ship target detection algorithm based on improved faster r-cnn. In Electronics, 2019; Vol. 8. http://dx.doi.org/10.3390/electronics8090959.
- View Article
- Google Scholar
37. Shan Y.; Zhou X.; Liu S.; Zhang Y.; Huang K. Siamfpn: A deep learning method for accurate and real-time maritime ship tracking. IEEE Transactions on Circuits and Systems for Video Technology 2021, 31, 315–325. http://dx.doi.org/10.1109/TCSVT.2020.2978194.
- View Article
- Google Scholar
38. Wang, Y.; Ning, X.; Leng, B.; Fu, H. In Ship detection based on deep learning, 2019 IEEE International Conference on Mechatronics and Automation (ICMA), 4–7 Aug. 2019, 2019; pp 275–279. http://dx.doi.org/10.1109/ICMA.2019.8816265.
39. Lee S.; Roh M.; Oh M. Image-based ship detection using deep learning. Ocean Systems Engineering 2020, 10, 415–434. http://dx.doi.org/10.12989/ose.2020.10.4.415.
- View Article
- Google Scholar
40. Liu R.W.; Yuan W.; Chen X.; Lu Y. An enhanced cnn-enabled learning method for promoting ship detection in maritime surveillance system. Ocean Engineering 2021, 235, 109435. http://dx.doi.org/10.1016/j.oceaneng.2021.109435.
- View Article
- Google Scholar
41. Xiangfu Z.; Zhangsong S.; Zhonghong W.; Jian L. In Sea surface ships detection method of uav based on improved yolov3, Proc.SPIE, 2020; p 113730T. http://dx.doi.org/10.1117/12.2557479.
42. Qin, Z.; Han, L.; Shi, B.; Zhang, X.; Xu, Y. Improved detection and recognition of sea surface ships based on yolov3. In Proceedings of the 4th International Conference on Electronics, Communications and Control Engineering, Association for Computing Machinery: Seoul, Republic of Korea, 2021; pp 40–47. http://dx.doi.org/10.1145/3462676.3462683.
43. Li Z.; Zhang Q.; Long T.; Zhao B. Ship target detection and recognition method on sea surface based on multi-level hybrid network. Journal of Beijing Institute of Technology 2021, 30, 1–10. http://dx.doi.org/10.15918/J.JBIT1004-0579.20141.
- View Article
- Google Scholar
44. Xu Q.Z.; Li Y.; Shi Z.W. Lmo-yolo: A ship detection model for low-resolution optical satellite imagery. Ieee Journal Of Selected Topics In Applied Earth Observations And Remote Sensing 2022, 15, 4117–4131.
- View Article
- Google Scholar
45. Liu, M.; Zhu, C. In Residual yolox-based ship object detection method, 2022 2nd International Conference on Consumer Electronics and Computer Engineering (ICCECE), 14–16 Jan. 2022, 2022; pp 427–431. http://dx.doi.org/10.1109/ICCECE54139.2022.9712778.
46. Wei C.; Juelong L.; Jianchun X.; Qiliang Y.; Qizhen Z. In A maritime targets detection method based on hierarchical and multi-scale deep convolutional neural network, Proc.SPIE, 2018; p 1080616. http://dx.doi.org/10.1117/12.2503030.
47. Chen Z.; Chen D.; Zhang Y.; Cheng X.; Zhang M.; Wu C. Deep learning for autonomous ship-oriented small ship detection. Safety Science 2020, 130, 104812. http://dx.doi.org/10.1016/j.ssci.2020.104812.
- View Article
- Google Scholar
48. Nina W.; Condori W.; Machaca V.; Villegas J.; Castro E. In Small ship detection on optical satellite imagery with yolo and yolt, Advances in Information and Communication, Cham, 2020//, 2020; Arai K.; Kapoor S.; Bhatia R., Eds. Springer International Publishing: Cham, pp 664–677. http://dx.doi.org/10.1007/978-3-030-39442-4_49.
49. Yu, H.; Li, Y.; Zhang, D. In An improved yolo v3 small-scale ship target detection algorithm, 2021 6th International Conference on Smart Grid and Electrical Automation (ICSGEA), 29–30 May 2021, 2021; pp 560–563. http://dx.doi.org/10.1109/ICSGEA53208.2021.00132.
50. Hu J.; Zhi X.; Shi T.; Zhang W.; Cui Y.; Zhao S. Pag-yolo: A portable attention-guided yolo network for small ship detection. In Remote Sensing, 2021; Vol. 13. http://dx.doi.org/10.3390/rs13163059.
- View Article
- Google Scholar
51. Escorcia-Gutierrez J.; Gamarra M.; Beleño K.; Soto C.; Mansour R.F. Intelligent deep learning-enabled autonomous small ship detection and classification model. COMPUTERS & ELECTRICAL ENGINEERING 2022, 100. http://dx.doi.org/10.1016/j.compeleceng.2022.107871.
- View Article
- Google Scholar
52. Wang Y.; Li J.; Chen Z.; Wang C. Ships’ small target detection based on the cbam-yolox algorithm. In Journal of Marine Science and Engineering, 2022; Vol. 10. http://dx.doi.org/10.3390/jmse10122013.
- View Article
- Google Scholar
53. Qu P.; Cheng E.; Chen K. In Real-time ocean small target detection based on improved yolox network, OCEANS 2022, Hampton Roads, 17–20 Oct. 2022, 2022; pp 1–5. http://dx.doi.org/10.1109/OCEANS47191.2022.9977103.
54. Wang, C.Y.; Liao, H.Y.M.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W.; Yeh, I.H. In Cspnet: A new backbone that can enhance learning capability of cnn, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 14–19 June 2020, 2020; pp 1571–1580. http://dx.doi.org/10.1109/CVPRW50498.2020.00203.
55. Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-iou loss: Faster and better learning for bounding box regression. Proceedings of the AAAI Conference on Artificial Intelligence 2020, 34, 12993–13000. http://dx.doi.org/10.1609/aaai.v34i07.6999.
56. Shao Z.; Wu W.; Wang Z.; Du W.; Li C. Seaships: A large-scale precisely annotated dataset for ship detection. IEEE Transactions on Multimedia 2018, 20, 2593–2604. http://dx.doi.org/10.1109/TMM.2018.2865686.
- View Article
- Google Scholar
57. Zheng, Y.; Zhang, S. In Mcships: A large-scale ship dataset for detection and fine-grained categorization in the wild, 2020 IEEE International Conference on Multimedia and Expo (ICME), 6–10 July 2020, 2020; pp 1–6. http://dx.doi.org/10.1109/icme46284.2020.9102907.
58. Shao, Z.; Wang, J.; Deng, L.; Huang, X.; Lu, T.; Luo, F.; et al. Glsd: The global large-scale ship database and baseline evaluations. arXiv preprint arXiv:2106.02773 2021. http://dx.doi.org/10.48550/arXiv.2106.02773.

[ref1] 1. Shao Z.; Wang L.; Wang Z.; Du W.; Wu W. Saliency-aware convolution neural network for ship detection in surveillance video. IEEE Transactions on Circuits and Systems for Video Technology 2020, 30, 781–794. http://dx.doi.org/10.1109/TCSVT.2019.2897980.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Wang L.; Fan S.; Liu Y.; Li Y.; Fei C.; Liu J.; et al. A review of methods for ship detection with electro-optical images in marine environments. In Journal of Marine Science and Engineering, 2021; Vol. 9. http://dx.doi.org/10.3390/jmse9121408.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Pradhan C.; Gupta A. Ship detection using neyman-pearson criterion in marine environment. Ocean Engineering 2017, 143, 106–112. http://dx.doi.org/10.1016/j.oceaneng.2017.03.008.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Li Y.; Li Z.; Zhu Y.; Li B.; Xiong W.; Huang Y. Thermal infrared small ship detection in sea clutter based on morphological reconstruction and multi-feature analysis. In Applied Sciences, 2019; Vol. 9. http://dx.doi.org/10.3390/app9183786.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Zhou A.; Xie W.; Pei J. Infrared maritime target detection using the high order statistic filtering in fractional fourier domain. Infrared Physics & Technology 2018, 91, 123–136. http://dx.doi.org/10.1016/j.infrared.2018.04.006.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Lin C.; Chen W.; Zhou H. Multi-visual feature saliency detection for sea-surface targets through improved sea-sky-line detection. In Journal of Marine Science and Engineering, 2020; Vol. 8. http://dx.doi.org/10.3390/jmse8100799.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref7] 7. Shan X.; Zhao D.; Pan M.; Wang D.; Zhao L. Sea–sky line and its nearby ships detection based on the motion attitude of visible light sensors. In Sensors, 2019; Vol. 19. http://dx.doi.org/10.3390/s19184004. pmid:31527524
View Article
PubMed/NCBI
Google Scholar

[20] View Article

[21] PubMed/NCBI

[22] Google Scholar

[ref8] 8. Yang W.; Li H.; Liu J.; Xie S.; Luo J. A sea-sky-line detection method based on gaussian mixture models and image texture features. International Journal of Advanced Robotic Systems 2019, 16, 1729881419892116. http://dx.doi.org/10.1177/1729881419892116.
View Article
Google Scholar

[24] View Article

[25] Google Scholar

[ref9] 9. Liang D.; Liang Y. Horizon detection from electro-optical sensors under maritime environment. IEEE Transactions on Instrumentation and Measurement 2020, 69, 45–53. http://dx.doi.org/10.1109/TIM.2019.2893008.
View Article
Google Scholar

[27] View Article

[28] Google Scholar

[ref10] 10. Fefilatyev S.; Goldgof D.; Shreve M.; Lembke C. Detection and tracking of ships in open sea with rapidly moving buoy-mounted camera system. Ocean Engineering 2012, 54, 1–12. http://dx.doi.org/10.1016/j.oceaneng.2012.06.028.
View Article
Google Scholar

[30] View Article

[31] Google Scholar

[ref11] 11. Wang, B.; Han, B.; Yang, L. In Accurate real-time ship target detection using yolov4, 2021 6th International Conference on Transportation Information and Safety (ICTIS), 22–24 Oct. 2021, 2021; pp 222–227. http://dx.doi.org/10.1109/ICTIS54573.2021.9798495.

[ref12] 12. Kong M.-C.; Roh M.-I.; Kim K.-S.; Lee J.; Kim J.; Lee G. Object detection method for ship safety plans using deep learning. Ocean Engineering 2022, 246, 110587. http://dx.doi.org/10.1016/j.oceaneng.2022.110587.
View Article
Google Scholar

[34] View Article

[35] Google Scholar

[ref13] 13. Zhou, S.; Yin, J. In Yolo-ship: A visible light ship detection method, 2022 2nd International Conference on Consumer Electronics and Computer Engineering (ICCECE), 14–16 Jan. 2022, 2022; pp 113–118. http://dx.doi.org/10.1109/ICCECE54139.2022.9712768.

[ref14] 14. Cheng S.; Zhu Y.; Wu S. Deep learning based efficient ship detection from drone-captured images for maritime surveillance. Ocean Engineering 2023, 285, 115440. http://dx.doi.org/10.1016/j.oceaneng.2023.115440.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref15] 15. Liu Z.; Zhang W.; Yu H.; Zhou S.; Qi W.; Guo Y.; et al. Improved yolov5s for small ship detection with optical remote sensing images. IEEE Geoscience and Remote Sensing Letters 2023, 20, 1–5. http://dx.doi.org/10.1109/LGRS.2023.3319025.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref16] 16. Sun Y.; Su L.; Yuan S.; Meng H. Danet: Dual-branch activation network for small object instance segmentation of ship images. IEEE Transactions on Circuits and Systems for Video Technology 2023, 33, 6708–6720. http://dx.doi.org/10.1109/TCSVT.2023.3267127.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref17] 17. Guo G.; Chen P.; Yu X.; Han Z.; Ye Q.; Gao S. Save the tiny, save the all: Hierarchical activation network for tiny object detection. IEEE Transactions on Circuits and Systems for Video Technology 2024, 34, 221–234. http://dx.doi.org/10.1109/TCSVT.2023.3284161.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref18] 18. Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. In Tph-yolov5: Improved yolov5 based on transformer prediction head for object detection on drone-captured scenarios, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 11–17 Oct. 2021, 2021; pp 2778–2788. http://dx.doi.org/10.1109/ICCVW54120.2021.00312.

[ref19] 19. dog-qiuqiu. dog-qiuqiu/Yolo-FastestV2: V0.2. Zenodo, http://dx.doi.org/10.5281/zenodo.5181503, 2021.

[ref20] 20. Akyon, F.C.; Altinuc, S.O.; Temizel, A. In Slicing aided hyper inference and fine-tuning for small object detection, 2022 IEEE International Conference on Image Processing (ICIP), 16–19 Oct. 2022, 2022; pp 966–970. http://dx.doi.org/10.1109/ICIP46576.2022.9897990.

[ref21] 21. Zhang Y.; Li Q.-Z.; Zang F.-N. Ship detection for visual maritime surveillance from non-stationary platforms. Ocean Engineering 2017, 141, 53–63. http://dx.doi.org/10.1016/j.oceaneng.2017.06.022.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref22] 22. He K.; Zhang X.; Ren S.; Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 2015, 37, 1904–1916. http://dx.doi.org/10.1109/TPAMI.2015.2389824. pmid:26353135
View Article
PubMed/NCBI
Google Scholar

[56] View Article

[57] PubMed/NCBI

[58] Google Scholar

[ref23] 23. Girshick, R. In Fast r-cnn, 2015 IEEE International Conference on Computer Vision (ICCV), 7–13 Dec. 2015, 2015; pp 1440–1448. http://dx.doi.org/10.1109/ICCV.2015.169.

[ref24] 24. Ren S.; He K.; Girshick R.; Sun J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 2017, 39, 1137–1149. http://dx.doi.org/10.1109/TPAMI.2016.2577031. pmid:27295650
View Article
PubMed/NCBI
Google Scholar

[61] View Article

[62] PubMed/NCBI

[63] Google Scholar

[ref25] 25. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. In You only look once: Unified, real-time object detection, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27–30 June 2016, 2016; pp 779–788. http://dx.doi.org/10.1109/CVPR.2016.91.

[ref26] 26. Redmon, J.; Farhadi, A. In Yolo9000: Better, faster, stronger, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 21–26 July 2017, 2017; pp 6517–6525. http://dx.doi.org/10.1109/CVPR.2017.690.

[ref27] 27. Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv preprint 2018. http://dx.doi.org/10.48550/arXiv.1804.02767.

[ref28] 28. Bochkovskiy, A.; Wang, C.; Liao, H. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint 2020. http://dx.doi.org/10.48550/arXiv.2004.10934.

[ref29] 29. Jocher, G.; Chaurasia, A.; Stoken A.; Borovec J.; NanoCode012; Kwon Y. et al. ultralytics/yolov5: v6.1—TensorRT, TensorFlow Edge TPU and OpenVINO Export and Inference. Zenodo, 2022. http://dx.doi.org/10.5281/zenodo.6222936.

[ref30] 30. Jocher, G.; Chaurasia, A.; Qiu, J. YOLO by Ultralytics (Version 8.0.0) [Computer software], https://github.com/ultralytics/ultralytics, 2023.

[ref31] 31. Wang, A.; Chen, H.; Liu, L.; Chen, K. & Lin, Z. YOLOv10: real-time end-to-end object detection. arXiv preprint arXiv:2405.14458v1 2024. https://arxiv.org/pdf/2405.14458.

[ref32] 32. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; et al. In Ssd: Single shot multibox detector, Computer Vision–ECCV 2016, Cham, 2016//, 2016; Leibe, B.; Matas, J.; Sebe, N.; Welling, M., Eds. Springer International Publishing: Cham, pp 21–37. http://dx.doi.org/10.1007/978-3-319-46448-0_2.

[ref33] 33. Lin T.Y.; Goyal P.; Girshick R.; He K.; Dollár P. Focal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 2020, 42, 318–327. http://dx.doi.org/10.1109/TPAMI.2018.2858826. pmid:30040631
View Article
PubMed/NCBI
Google Scholar

[73] View Article

[74] PubMed/NCBI

[75] Google Scholar

[ref34] 34. Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. Centernet: Keypoint triplets for object detection. 2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019, 6568–6577. http://dx.doi.org/10.48550/arXiv.1904.08189.

[ref35] 35. Marié, V.; Béchar, I.; Bouchara, F. In Real-time maritime situation awareness based on deep learning with dynamic anchors, 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 27–30 Nov. 2018, 2018; pp 1–6. http://dx.doi.org/10.1109/AVSS.2018.8639373.

[ref36] 36. Qi L.; Li B.; Chen L.; Wang W.; Dong L.; Jia X.; et al. Ship target detection algorithm based on improved faster r-cnn. In Electronics, 2019; Vol. 8. http://dx.doi.org/10.3390/electronics8090959.
View Article
Google Scholar

[79] View Article

[80] Google Scholar

[ref37] 37. Shan Y.; Zhou X.; Liu S.; Zhang Y.; Huang K. Siamfpn: A deep learning method for accurate and real-time maritime ship tracking. IEEE Transactions on Circuits and Systems for Video Technology 2021, 31, 315–325. http://dx.doi.org/10.1109/TCSVT.2020.2978194.
View Article
Google Scholar

[82] View Article

[83] Google Scholar

[ref38] 38. Wang, Y.; Ning, X.; Leng, B.; Fu, H. In Ship detection based on deep learning, 2019 IEEE International Conference on Mechatronics and Automation (ICMA), 4–7 Aug. 2019, 2019; pp 275–279. http://dx.doi.org/10.1109/ICMA.2019.8816265.

[ref39] 39. Lee S.; Roh M.; Oh M. Image-based ship detection using deep learning. Ocean Systems Engineering 2020, 10, 415–434. http://dx.doi.org/10.12989/ose.2020.10.4.415.
View Article
Google Scholar

[86] View Article

[87] Google Scholar

[ref40] 40. Liu R.W.; Yuan W.; Chen X.; Lu Y. An enhanced cnn-enabled learning method for promoting ship detection in maritime surveillance system. Ocean Engineering 2021, 235, 109435. http://dx.doi.org/10.1016/j.oceaneng.2021.109435.
View Article
Google Scholar

[89] View Article

[90] Google Scholar

[ref41] 41. Xiangfu Z.; Zhangsong S.; Zhonghong W.; Jian L. In Sea surface ships detection method of uav based on improved yolov3, Proc.SPIE, 2020; p 113730T. http://dx.doi.org/10.1117/12.2557479.

[ref42] 42. Qin, Z.; Han, L.; Shi, B.; Zhang, X.; Xu, Y. Improved detection and recognition of sea surface ships based on yolov3. In Proceedings of the 4th International Conference on Electronics, Communications and Control Engineering, Association for Computing Machinery: Seoul, Republic of Korea, 2021; pp 40–47. http://dx.doi.org/10.1145/3462676.3462683.

[ref43] 43. Li Z.; Zhang Q.; Long T.; Zhao B. Ship target detection and recognition method on sea surface based on multi-level hybrid network. Journal of Beijing Institute of Technology 2021, 30, 1–10. http://dx.doi.org/10.15918/J.JBIT1004-0579.20141.
View Article
Google Scholar

[94] View Article

[95] Google Scholar

[ref44] 44. Xu Q.Z.; Li Y.; Shi Z.W. Lmo-yolo: A ship detection model for low-resolution optical satellite imagery. Ieee Journal Of Selected Topics In Applied Earth Observations And Remote Sensing 2022, 15, 4117–4131.
View Article
Google Scholar

[97] View Article

[98] Google Scholar

[ref45] 45. Liu, M.; Zhu, C. In Residual yolox-based ship object detection method, 2022 2nd International Conference on Consumer Electronics and Computer Engineering (ICCECE), 14–16 Jan. 2022, 2022; pp 427–431. http://dx.doi.org/10.1109/ICCECE54139.2022.9712778.

[ref46] 46. Wei C.; Juelong L.; Jianchun X.; Qiliang Y.; Qizhen Z. In A maritime targets detection method based on hierarchical and multi-scale deep convolutional neural network, Proc.SPIE, 2018; p 1080616. http://dx.doi.org/10.1117/12.2503030.

[ref47] 47. Chen Z.; Chen D.; Zhang Y.; Cheng X.; Zhang M.; Wu C. Deep learning for autonomous ship-oriented small ship detection. Safety Science 2020, 130, 104812. http://dx.doi.org/10.1016/j.ssci.2020.104812.
View Article
Google Scholar

[102] View Article

[103] Google Scholar

[ref48] 48. Nina W.; Condori W.; Machaca V.; Villegas J.; Castro E. In Small ship detection on optical satellite imagery with yolo and yolt, Advances in Information and Communication, Cham, 2020//, 2020; Arai K.; Kapoor S.; Bhatia R., Eds. Springer International Publishing: Cham, pp 664–677. http://dx.doi.org/10.1007/978-3-030-39442-4_49.

[ref49] 49. Yu, H.; Li, Y.; Zhang, D. In An improved yolo v3 small-scale ship target detection algorithm, 2021 6th International Conference on Smart Grid and Electrical Automation (ICSGEA), 29–30 May 2021, 2021; pp 560–563. http://dx.doi.org/10.1109/ICSGEA53208.2021.00132.

[ref50] 50. Hu J.; Zhi X.; Shi T.; Zhang W.; Cui Y.; Zhao S. Pag-yolo: A portable attention-guided yolo network for small ship detection. In Remote Sensing, 2021; Vol. 13. http://dx.doi.org/10.3390/rs13163059.
View Article
Google Scholar

[107] View Article

[108] Google Scholar

[ref51] 51. Escorcia-Gutierrez J.; Gamarra M.; Beleño K.; Soto C.; Mansour R.F. Intelligent deep learning-enabled autonomous small ship detection and classification model. COMPUTERS & ELECTRICAL ENGINEERING 2022, 100. http://dx.doi.org/10.1016/j.compeleceng.2022.107871.
View Article
Google Scholar

[110] View Article

[111] Google Scholar

[ref52] 52. Wang Y.; Li J.; Chen Z.; Wang C. Ships’ small target detection based on the cbam-yolox algorithm. In Journal of Marine Science and Engineering, 2022; Vol. 10. http://dx.doi.org/10.3390/jmse10122013.
View Article
Google Scholar

[113] View Article

[114] Google Scholar

[ref53] 53. Qu P.; Cheng E.; Chen K. In Real-time ocean small target detection based on improved yolox network, OCEANS 2022, Hampton Roads, 17–20 Oct. 2022, 2022; pp 1–5. http://dx.doi.org/10.1109/OCEANS47191.2022.9977103.

[ref54] 54. Wang, C.Y.; Liao, H.Y.M.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W.; Yeh, I.H. In Cspnet: A new backbone that can enhance learning capability of cnn, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 14–19 June 2020, 2020; pp 1571–1580. http://dx.doi.org/10.1109/CVPRW50498.2020.00203.

[ref55] 55. Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-iou loss: Faster and better learning for bounding box regression. Proceedings of the AAAI Conference on Artificial Intelligence 2020, 34, 12993–13000. http://dx.doi.org/10.1609/aaai.v34i07.6999.

[ref56] 56. Shao Z.; Wu W.; Wang Z.; Du W.; Li C. Seaships: A large-scale precisely annotated dataset for ship detection. IEEE Transactions on Multimedia 2018, 20, 2593–2604. http://dx.doi.org/10.1109/TMM.2018.2865686.
View Article
Google Scholar

[119] View Article

[120] Google Scholar

[ref57] 57. Zheng, Y.; Zhang, S. In Mcships: A large-scale ship dataset for detection and fine-grained categorization in the wild, 2020 IEEE International Conference on Multimedia and Expo (ICME), 6–10 July 2020, 2020; pp 1–6. http://dx.doi.org/10.1109/icme46284.2020.9102907.

[ref58] 58. Shao, Z.; Wang, J.; Deng, L.; Huang, X.; Lu, T.; Luo, F.; et al. Glsd: The global large-scale ship database and baseline evaluations. arXiv preprint arXiv:2106.02773 2021. http://dx.doi.org/10.48550/arXiv.2106.02773.

Figures

Abstract

1. Introduction

2. Related works

2.1. Ship detection

2.2. YOLOv8 model

3. Proposed method

3.1. Ship dataset

3.2. SSL region selection model with YOLOv8

3.3. Ship detection with YOLOv8 and slicing technique

4. Experiments

4.1. Experimental environment and setup

4.2. Evaluation criterion

4.3. Experiments with and without our two-stage strategy

4.4. Comparison with the state-of-the-art

5. Conclusions

Acknowledgments

References