Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Improving fishing ground estimation with weak supervision and meta-learning

  • Kazuki Takasan ,

    Contributed equally to this work with: Kazuki Takasan, Masaaki Iiyama

    Roles Data curation, Software, Validation, Visualization, Writing – original draft

    Affiliation Graduate School of Data Science, Shiga University, Hikone, Shiga, Japan

  • Masaaki Iiyama

    Contributed equally to this work with: Kazuki Takasan, Masaaki Iiyama

    Roles Conceptualization, Data curation, Funding acquisition, Methodology, Project administration, Supervision, Writing – review & editing

    iiyama@iiyama-lab.org

    Affiliation Graduate School of Data Science, Shiga University, Hikone, Shiga, Japan

Abstract

Estimating fishing grounds is an important task in the fishing industry. This study modeled the fisher’s decision-making process based on sea surface temperature patterns as a pattern recognition task. We used a deep learning-based keypoint detector to estimate fishing ground locations from these patterns. However, training the model required catch data for annotation, the amount of which was limited. To address this, we proposed a training strategy that combines weak supervision and meta-learning to estimate fishing grounds. Weak supervision involves using partially annotated or noisy data, where the labels are incomplete or imprecise. In our case, catch data cover only a subset of fishing grounds, and trajectory data, which are readily available and larger in volume than catch data, provide imprecise representations of fishing grounds. Meta-learning helps the model adapt to the noise by refining its learning rate during training. Our approach involved pre-training with trajectory data and fine-tuning with catch data, with a meta-learner further mitigating label noise during pre-training. Experimental results showed that our method improved the F1-score by 64% compared to the baseline using only catch data, demonstrating the effectiveness of pre-training and meta-learning.

Introduction

Estimating fishing grounds, the areas where fish are most likely to be found and caught, is an important task in the fishing industry. The accurate estimation enhances the efficiency of fishing activities by guiding fishers to areas with higher catch potential, thus optimizing resources such as fuel and reducing the operational costs. Traditionally, fishers rely on empirical methods considering sea environmental information such as sea surface temperature (SST). However, the accuracy of these methods remains inconsistent due to the complexity of marine environments and variations in the fishers’ skill.

Previous studies have employed habitat suitability index (HSI) models [14] and statistical approaches such as generalized additive models (GAM) and maximum entropy models [510] to understand the relationship between sea environmental variables and fishing grounds. These methods typically focus on point-wise environmental variables, which fail to capture the broader, regional cues fishers use in their decision-making processes.

Recent advances in pattern recognition have inspired novel approaches that treat fishing ground estimation as a pattern recognition task. Our previous work [11,12] demonstrated the effectiveness of keypoint detection methods for fishing ground estimation by detecting regional SST patterns. These approaches emulate the process by which fishers implicitly read the two-dimensional pattern from the SST map to determine the fishing grounds. This can be defined as a general image recognition task, and deep learning-based methods have shown high performance. However, a significant challenge remains due to the lack of reliable annotated data for model training. Due to the scarcity of accurate catch data, the models cannot be adequately trained, resulting in reduced prediction accuracy.

In this paper, we propose a novel approach to tackle this challenge through weakly supervised pre-training and meta-learning. To compensate for the scarcity of precise catch data, we use fishing vessel trajectory data as weak supervision. While trajectory data is less accurate as an indicator of fishing grounds, it is widely available and provides useful patterns related to fishing activity. In the pre-training phase, the model learns these broader patterns from the trajectory data, which helps it learn characteristics of SST patterns. This process strengthens the model’s performance, allowing it to perform better when fine-tuned on limited but more accurate catch data. This approach is based on our previous preliminary study [13], and the current study aims to refine and enhance the methodology.

Additionally, to mitigate the effects of noisy labels that exist in trajectory data, we introduce meta-learning. Trajectory data may include locations where vessels visited but did not engage in fishing activities, made no catch, or areas unsuitable for the target fish species, leading to noisy labels that do not accurately represent fishing grounds. Inspired by previous work [14], we employ a meta-learner to help the model adapt to these unreliable labels during pre-training. The meta-learner assesses label reliability, ensuring that noisy or misleading data has less impact on the model’s learning process. While our study is consistent with their settings, focusing on many unreliable labels and a few reliable ones, our approach differs in that we address a detection task rather than the classification task they target. This distinction requires an adapted meta-learning strategy for effective implementation. As a result, our model exhibits improved robustness and enhanced performance when fine-tuned with more accurate catch data.

Our main contributions are summarized as follows.

  • Our proposed method offers the advantages of addressing data scarcity using trajectory data during pre-training, effectively overcoming the scarcity of annotated data, and making the model more adaptable to real-world scenarios where available catch records may be limited.
  • Furthermore, the integration of meta-learning enhances the model’s robustness by reducing the impact of noisy labels during pre-training, ensuring improved performance in the face of uncertainties associated with weakly supervised data.
  • A new application of weakly supervised learning and meta-learning in keypoint detection is presented, allowing for novel approaches to insufficient data sources prevalent in various domains.

Related work

Fishing ground estimation

Ecological and statistical approach.

Since the development of remote-sensing technology, several studies have attempted to estimate the location of fishing grounds using ocean environmental conditions such as SST and sea surface height (SSH).

Studies with an ecological approach [14] employ HSI model, which is a conceptual framework that explains the relationship between environmental factors and the distribution of a given species. It quantifies the suitability of an environment for a species at a particular location, scaling from 0 to 1, where this value is referred to as the HSI.

In the statistical approaches [510], GAMs or maximum entropy models are used to solve a regression problem with catch per unit effort (CPUE) as the objective variable. In recent years, machine learning models such as support vector machine (SVM), random forest, and neural networks have been used and have demonstrated improved performance in this task [1519].

Our study differs from most previous studies in predicting daily fishing grounds. In contrast, most previous studies have only predicted fishing grounds on a seasonal or monthly basis. Predicting daily changes in fishing grounds requires a large amount of data, which is difficult to collect. Therefore many studies have focused on seasonal predictions (which can be analyzed with less data). In contrast, we propose a method compensating for the lack of data by using weakly supervised learning to make daily predictions.

Pattern recognition approach.

Traditional methods focus on point-wise environmental variables; fishermen often rely on broader, region-wise environmental cues from their surrounding area, such as eddies and tidal patterns, to find fishing grounds.

In our previous work [20], we used a two-dimensional SST map as an input for predicting fishing grounds using SVM and spectral clustering. Furthermore, in [1113], fishing ground estimation was based on image recognition techniques, including object and keypoint-detection models, using 2D sea environmental data such as SST. While these approaches can incorporate information of data over a larger area, they face the challenge of partial annotation, where only some of the ground-truth points in the image are labeled. We addressed this issue by creating a specialized loss function [12] and expanding the supervised data using fishing vessel trajectories [13]. In this study, we extend trajectory data using it as weakly supervised data for model pre-training.

Application of fishing vessel trajectories.

Fishing vessel trajectory data have identified fishing activities, including detecting of illegal fishing [2123]. In addition, researchers have also used these trajectories to estimate the location of fishing grounds. Previous studies [9,18,19,24] classified locations as fishing grounds or non-fishing grounds based on the speed of vessels and subsequently labeled these areas accordingly. In this study, we employ a similar approach to extract potential fishing ground locations from the trajectory data, using this data as weak supervision. Although trajectory data is inherently noisy, it serves as a large-scale source of information about fishing activities. A similar strategy has been successfully applied in other domains, such as vision-language representation learning, where noisy data (e.g., image-alt text pairs) was used to train models with impressive results despite the noise [42]. In our case, despite the noise in the trajectory data, its large-scale nature makes it a valuable signal for learning fishing ground locations.

Keypoint detection

Keypoint detection is the task of detecting characteristic points in an image. A typical application of this task is human pose estimation, addressing the detection of human joints in an image and estimating human posture. Since the development of deep learning techniques, methods using deep neural networks (DNN) or convolutional neural networks (CNN) have been proposed [2531].

There are two approaches for estimating keypoints: direct regression of keypoint coordinates [2527] and heatmap regression [2831]. In the latter approach, a 2D Gaussian kernel is applied to the ground-truth coordinates to create a ground-truth heatmap and minimize the loss to the output heatmap. In addition to being easier to implement and more accurate [32], the heatmap regression is more suitable for fishing ground estimation, given that the output heatmaps are used as recommendation maps for fishers. Therefore, we set fishing grounds as keypoints, estimate the locations from 2D SST patterns as input images, and output the estimated locations as heatmaps. We do not consider the connection between the keypoints, and only use the output heatmap for the estimation.

Weakly supervised learning

Weakly supervised learning uses incomplete, inexact, and inaccurate supervised data [33]. This approach includes situations where only a subset of the dataset is labeled (also referred to as semi-supervised learning [34]), situations with inaccurate labels (referred to as noisy labels [35]), and situations with coarse-grained labels, such as assigning labels to entire images rather than individual objects within the images [3638]. Partially annotated data, characterized by missing labels for some objects in images, is a form of incomplete supervised data and is classified as weakly supervised learning. Researchers have addressed this challenge by adapting the loss function [39], employing specific sampling techniques [40], and self-supervised techniques [41]. The annotated data used in our study is considered partially annotated, either without noise or including noisy labels, since catch data do not cover all potential fishing grounds, and the representation of fishing ground locations through trajectory data is inaccurate.

In situations where there are numerous weakly labeled data and few strongly labeled data, a typical approach involves initial pre-training with weak labels followed by fine-tuning with strong labels [4245]. This strategy is commonly seen in transfer learning, where models are pre-trained on a large, often general-purpose dataset, and then fine-tuned on a smaller, domain-specific dataset to adapt the model to the specific task. However, in our case, the available labeled data are weak and imprecise, and we lack access to sufficiently large pre-training datasets typical in transfer learning, making weakly supervised learning more suitable. However, noisy labels during pre-training can negatively affect model’s performance. We introduced meta-learning to address this issue and adjust the influence of noisy labels on weakly annotated data during the training [14]. Their work follows the approach proposed by Andrychowicz et al. [46], expanding the application of meta-learning to weakly supervised learning.

Methods

Keypoint detection as fishing ground estimation

In this research, we addressed the challenge of accurately determining the locations of productive fishing grounds using SST maps, framing this task as a keypoint detection. This approach is inspired by the estimation process of fishers, who use 2D patterns of SST such as eddies and tides to find good fishing grounds. Like human pose estimation, a classic example of keypoint detection, we treated fishing grounds as keypoints in our fishing ground estimation and estimated their locations. The input for this task is a 2D image corresponding to a two-dimensional representation of SST, as Fig 1 shows.

thumbnail
Fig 1. SST map as an input image.

The coloring is for visualization; the image used for training is a two-dimensional array containing the SST itself.

https://doi.org/10.1371/journal.pone.0321116.g001

Instead of directly predicting the coordinates of the keypoints, our model outputs a prediction heatmap, as Fig 2 shows. Areas with higher values on this heatmap indicated a higher likelihood of being fishing grounds and are thus suggested to fishermen as potential fishing locations. The model’ training requires a ground-truth heatmap constructed using catch location and catch amount data (hereafter referred to as “catch data”) and fishing vessel trajectory data (hereafter referred to as “trajectory data”). The following section describes the detailed methodology of this annotation process.

thumbnail
Fig 2. Output heatmap.

The areas with higher values in the heatmap indicate a higher likelihood of fishing grounds.

https://doi.org/10.1371/journal.pone.0321116.g002

Overview of our method

Fig 3 illustrates an overview of the proposed method. Our method consists of two phases: pre-training and fine-tuning. In the pre-training phase, the model is first trained with heatmaps generated from trajectory data. In the fine-tuning phase, training continues with heatmaps generated using catch data.

thumbnail
Fig 3. Overview of the proposed method.

Our method involves pre-training with trajectory data as weak supervision and fine-tuning with catch data as strong supervision. In the pre-training phase, meta-learning is employed to reduce the effect of noisy labels.

https://doi.org/10.1371/journal.pone.0321116.g003

During the pre-training, we applied therefore meta-learning method proposed by Mostafa et al. [14] to control the contribution of noisy labels in the training data with trajectory data, which include unsuitable fishing locations (those with no catch or effort) or areas that do not correspond to the target species’ habitat. In their method, network weights w are updated as follows:

(1)

where is the meta-learner named confidence network, which outputs the confidence for a pair of input instance and a weak label . This learner calibrates the learning rate for each label. is the learner for the target task named target network. L is the loss function, is the global learning rate, and b is the batch size. Since their method assumes a classification task and cannot be directly applied to our target task, we redefined the ground-truth confidence and the architecture of the confidence network. The Application of meta-learning section provides the details regarding the application method.

Data

We use three types of data: SST, catch data, and fishing vessel trajectories.

SST is daily grid data based on satellite observations. An example of an SST map is shown in Fig. 4. Grids corresponding to land areas are treated as missing values and filled by zero. We use two-dimensional SST patterns as input because characteristic SST patterns, such as eddies and tides, can provide insights into estimating fishing grounds and suitable water temperatures for the target fish species.

thumbnail
Fig 4. Example of a sea surface temperature map and fishing ground (red circle).

https://doi.org/10.1371/journal.pone.0321116.g004

Catch data based on fishing logbooks are tabular data that record fishing activity’s operation date, geographic coordinates (latitude and longitude), and CPUE. Fig 4 shows an illustrative example where a red circle represents the location of a fishing ground. This data is a strong label derived directly from fishing vessels. In this study, we use this data to annotate ground-truth fishing grounds for the input images.

The fishing vessel trajectories are derived from the automatic identification system (AIS), which records an identification code along with historical data on the location of the fishing vessels, including date, location, and duration of stay. Fig 5 provides an illustrative example of this data. This study annotates these data, serving as weak labels with low-certainty. While the trajectories contain only information on the location of the fishing vessels and no information on the catch or location of the catch, it is possible to infer locations that are non-operational or likely to have been operational based on the speed of movement and movement patterns. For example, suppose a trajectory shown in Fig 5 is a straight line of a single day’s travel distance length. In that case, no fishing operation likely took place along that path, and the existence of a fishing ground can be rejected. Conversely, if the daily distance of a trajectory is short, or if the trajectory shows repeated movement to and from the same point, it can be inferred with a high degree of certainty that fishing activity has occurred.

thumbnail
Fig 5. Examples of fishing vessel trajectories.

A red line indicates the daily trajectory of each vessel.

https://doi.org/10.1371/journal.pone.0321116.g005

Annotation

Training a keypoint-detection model requires ground-truth heatmaps representing the likelihood of the presence of keypoints. Our proposed approach generates heatmaps where the fishing ground location has a high value, which gradually decreases as the distance from that location increases, as in heatmaps for human pose estimation.

As in the conventional methods, a two-dimensional Gaussian kernel in Eq (2) creates ground-truth heatmaps:

(2)

where p represents the image coordinates, represents the ground-truth coordinates, and σ represents the standard deviation.

To utilize positive examples (fishing ground class) and negative examples (non-fishing ground class) for model training, we define labels associated with the catch and trajectory data as shown in Table 1. Speed refers to the distance between the start and end points of a fishing vessel’s daily activity. The labels “Good” and “Bad” indicate the quality of fishing grounds based on catch data, while the other labels are associated with trajectory data. “Unlikely” refers to a location with a significant daily travel distance where no fishing operation is estimated to have occurred. “Unknown” denotes a location where none of the above cases are present, and no distinction can be made between “Good" and “Bad" fishing grounds.

As discussed in the previous section, fishing vessel trajectories provide insights into potential fishing grounds, and locations unlikely to be fishing grounds. The catch data also provide information on locations where no catches were made or where catches were deemed unlikely, despite the occurrence of fishing operations.

In essence, the catch data and fishing trajectory data can identify three categories: (1) those that indicate instances where the fishery was operational and a certain quantity of fish were captured (good fishing grounds), (2) those that reflect instances where the fishery was operational or where it relocated to a particular location for operation but resulted in no fish being captured (bad fishing grounds), and (3) those that indicate instances where the fishery was never considered for operation initially (inappropriate fishing grounds). The proposed approach learns three types of heatmaps for each case to benefit from such negative examples. More precisely, the ground-truth heatmap comprises three channels corresponding to classes of fishing grounds. The classes are defined as “Good,” “Bad,” and “Unlikely,” and the association with the label is determined as presented in Table 2. While only the “Good” channel evaluates, incorporating multi-task learning is expected to improve the generalization performance.

We prepare two types of ground-truth heatmaps: a heatmap using only catch data for fine-tuning and a heatmap using only trajectory data for pre-training, as shown in Fig 6. The locations with larger values in the heatmaps indicate a higher likelihood of the existence of fishing grounds.

thumbnail
Fig 6. Examples of ground-truth heatmaps.

The heatmaps with the trajectory data are the same for both “Good" and “Bad" channels. The “Unlikely" channel in the heatmap with the catch data all contains zero values.

https://doi.org/10.1371/journal.pone.0321116.g006

Keypoint detection model

For our fishing ground estimation, we employed Lightweight OpenPose [30], a streamlined version of OpenPose [29]. Note that our method is designed to be model-agnostic, so in principle it can be combined with any backbone architecture. Lightweight OpenPose was chosen for computational feasibility in our setting. The original OpenPose architecture outputs both keypoint heatmaps and part affinity fields to capture pairwise relations between keypoints. However, in our method, we focus exclusively on the keypoint heatmaps, which predict the likelihood of fishing grounds at each location on the SST map.

The model is composed of multiple convolutional layers that progressively refine the heatmaps across stages. Both the input SST maps and the output heatmaps are resized to maintain spatial consistency as Fig 7 shows. This approach leverages the efficiency of the lightweight architecture while focusing on keypoint detection for fishing ground estimation.

thumbnail
Fig 7. Model input and output shapes.

The input and output image size is resized to fit the network architecture.

https://doi.org/10.1371/journal.pone.0321116.g007

Application of meta-learning

The work by Mostafa et al. [14] is designed for a classification task and is not directly applicable to the fishing ground estimation as a detection task. Our method redefines “confidence” and restores the confidence network.

In [14], a binary label that indicates whether a weak label is true represents the concept of ground-truth confidence. However, in our case, it is not feasible to categorize the accuracy of the weak label into a binary format, and the strong label y does not necessarily correspond to the truth. Consequently, we defined the degree of validity of the input-weak label pair as the confidence as follows:

(3)

where y denotes the heatmap of a two-dimensional array derived exclusively from the catch data, while denotes the heatmap solely derived from the trajectory data. Through the operation of , confidence diminishes when non-zero pixels in y coincide with zeros in , as seen in the bottom of Fig 9. This design is attributed to the enhanced reliability of y as a source of fishing ground locations. Conversely, confidence increases when pixels are zero in y but non-zero in as indicated at the top of Fig 9. While regions with zero pixels in y are excluded from the confidence evaluation, this definition is grounded in the assumption that y excludes unobserved fishing grounds, thus making the region appear smaller than its actual extent. We adopted the same approach in the loss function introduced by [12]. is the weighting factor to address the empirical observation where models trained only with catch data tend to generate smaller regions than the actual fishing grounds. This factor, proportional to the count of non-zero pixels in the weak label heatmap , augments the model’s confidence for samples characterized by a more extensive non-zero region. The objective is to motivate the model to prioritize learning from such samples, facilitating the generation of larger and more realistic fishing ground regions in its output.

Fig. 8 is the histogram of the confidence in the dataset we use, with approximately one-third of the samples having a confidence higher than 0.9.

thumbnail
Fig 9. Examples of correspondence between confidence and each ground-truth heatmap.

https://doi.org/10.1371/journal.pone.0321116.g009

The confidence network has an architecture as shown in Fig 10. The architecture is designed for two-class classification (with soft labels), using SST maps and ground-truth heatmaps generated from trajectory data as inputs. Each input image passes through two convolutional layers, and each feature map is flattened and concatenated. We used the convolutional layer with a kernel size of 3, zero padding, and a stride of 1. The activation function is the identity function for the output layer and ReLU for the other layers, while the sigmoid function is applied only to the output layer during inference.

Evaluation

We evaluated the accuracy of the estimated fishing grounds by considering the geodesic distance, referred to as , between the peak coordinates derived from the output heatmap and the ground-truth coordinates. The evaluation criterion dictates that should be less than 200 km, the estimated maximum distance a fishing vessel can move within a single day, assuming a speed of 14 knots (equivalent to 26 km/h) for approximately eight hours. Precision, recall, and F1-score are the performance metrics utilized in the evaluation process.

Experiments

Dataset

We obtained SST data from the National Oceanic and Atmospheric Administration (NOAA) [47] and catch data from the Shizuoka Prefectural Research Institute of Fishery and Ocean, specifically focusing on skipjack tuna fishing operation data. The fishing vessel trajectories were acquired from the Global Fishing Watch [22].

We set the designated sea area between 20°N and 50°N, between 130°E and 150°W, with a grid interval of 0.25°.

Pre-processing. The pre-processing of vessel trajectories involved selecting only single-line fishing vessels to ensure data consistency with our target species, skipjack tuna. Furthermore, we applied an SST filter, excluding any trajectory points where the SST was below 19°C or above 26°C, since skipjack tuna are typically found in these temperature ranges.

For generating ground-truth heatmaps, we applied the Gaussian kernel from Eq (2) with a standard deviation (σ) of 7 to either the catch location or the averaged coordinates of each vessel’s daily activity to represent its location.

To prevent the over-densification of predicted points, peak detection was performed on the resulting heatmap using 8 × 8 maximum value filtering.

Dataset composition. We utilized 2087 days of trajectory data collected between 2012 and 2017 for pre-training the model. Additionally, 371 days of catch data from 2012 and 2015 were used for fine-tuning the model and training the confidence network. For testing, we employed 154 days of catch data from 2018. Table 3 summarizes the number of images and the instance counts for each label across the different phases of the experiment.

thumbnail
Table 3. The Number of images and instances for each label in the dataset.

“-" denotes a non-existent item.

https://doi.org/10.1371/journal.pone.0313772.t003

Experimental design

The dataset was divided into training, validation, and test sets to ensure a robust evaluation of model performance. Trajectory data from 2012 to 2017 was utilized for pre-training the target network, while catch data from 2012 to 2015 was split into training and validation sets, with a 0.2 validation split ratio applied. This catch data was also used to fine-tune the target network and to train the confidence network. The test set, consisting of 154 days of catch data from 2018, was reserved exclusively for final evaluation.

Training and hyperparameter selection. The training process involved two main stages: pre-training and fine-tuning. During pre-training, the model learned from trajectory data, and fine-tuning adjusted the network using catch data. In addition, the confidence network was trained on catch data, and its inference output was used as a weight to guide the fine-tuning of the target network.

Table 4 shows the hyperparameters used in the training for each network. The epoch of training for the target network was determined by the highest validation F1-score and the one for confidence network was determined by the lowest validation loss. The other hyperparameters were tuned empirically, balancing convergence speed and stability.

Results

We compared our proposed method (w/confidence) with the following three patterns: (catch only) trained using only catch data, (fine-tuning) our previous method of pre-training with heatmaps using both trajectories and catch data [13] and (w/o confidence) our method without confidence in pre-training, which corresponds to existing weakly supervised learning methods.

Table 5 shows the experimental results. For the F1-score and recall, w/confidence achieved the highest performance. The precision for catch only was the highest, but the number of predictions and the recall were the lowest, indicating an under-detection problem. The methods using trajectory data for training increased the number of predictions and improved recall, and w/o confidence and w/confidence showed further improvement because about five times more days were used for pre-training.

Fig 11 illustrates a visualization of output heatmaps. Other trajectory data methods identify fishing locations not covered by the catch only approach. Although the outputs of the three methods with trajectory data are similar, there are cases where the w/ confidence approach produces more points or provides more accurate locations.

thumbnail
Fig 11. Examples of output heatmaps overlapped with SST maps:

(a) catch only (b) fine-tuning (c) w/o confidence (d) w/ confidence.  △  represents the estimated point and  +  represents the ground-truth coordinate.

https://doi.org/10.1371/journal.pone.0321116.g011

Discussion

Effectiveness of meta-learning

Fig 13 shows examples of the ground-truth heatmaps for the high and low values generated by the confidence network during pre-training. In the case of high confidence, the heatmap exhibits larger non-zero areas or suitable areas for fishing grounds (similar to the spatial distribution of the catch record). Conversely, low-confidence instances typically feature heatmaps with smaller non-zero areas or areas unsuitable for identifying fishing grounds, which are rarely observed in the catch record. Fig 12 shows the distribution of confidence values during pre-training. As there is a larger number of low-confidence samples, the latter type of heatmap has a diminished impact on model weight updates, contributing to improved performance during fine-tuning. Table 6 presents the results of evaluating the pre-trained model on the test data. w/ confidence aligns well with the catch data, indicating that the improved similarity between source and target tasks in transfer learning leads to enhanced fine-tuning performance.

thumbnail
Fig 13. Examples of input heatmaps in the cases where the confidence network output values are high and low.

They are resized for visualization. A red marker represents a catch record location in the training data (2012).

https://doi.org/10.1371/journal.pone.0321116.g013

Change of loss function

We employed a modified loss function to address the under-detection instead of using trajectory data, and we examined the synergistic effects. Hinge loss proposed in [12] is defined as follows:

(7)

where c is a channel of a ground-truth heatmap, and α is a hyperparameter between 0 and 1. sets the over-detection loss to zero, leading to an increase in the number of predictions.

However, as shown in Table 7, the experimental results of using this loss function revealed that while the F1-score improved for the catch only, it did not improve for the w/ confidence. This discrepancy can be explained by the fact that hinge loss does not penalize over-detection effectively. Specifically, hinge loss calculates the penalty for over-detection as 0 regardless of the output position, which leads to a situation where, in the pre-trained state where the number of predictions has already increased, additional predictions are made for unsuitable fishing locations. These predictions do not represent accurate fishing grounds, causing a decrease in precision without a corresponding increase in recall.

thumbnail
Table 7. Evaluation results for changes in the loss function.

https://doi.org/10.1371/journal.pone.0313772.t007

This behavior suggests that hinge loss, while helpful in increasing predictions to avoid under-detection, may not align with the specific objectives of our task. In particular, the balance between increasing predictions and maintaining precision becomes critical. As the model starts to predict more locations, including unsuitable ones, the lack of penalty for over-detection leads to a decrease in precision without a beneficial increase in recall. Therefore, the use of hinge loss in this context requires further optimization to ensure that the increase in predictions does not degrade model performance in terms of precision.

Limitations

The F1-scores from our experiments are significantly lower than those of general keypoint-detection tasks such as human pose estimation. This performance gap can be attributed to the intrinsic complexity of the task. Unlike human keypoints, which follow anatomically fixed positions, fishing grounds are dynamic and influenced by oceanographic conditions, vessel behavior, and ecological factors. Additionally, while human pose estimation benefits from precise, manually annotated datasets, fishing ground labels are inferred from incomplete catch data and noisy trajectory data, introducing additional uncertainty.

The incompleteness of catch data annotations hindered our model’s performance, affecting both fine-tuning and meta-learning strategies relying on accurate ground-truth heatmaps derived from catch data. A potential strategy to improve fishing ground annotations is incorporating sonar data from buoys and vessels. Brehmer et al. demonstrated omnidirectional multibeam sonar, commonly found on fishing vessels, provides real-time monitoring of fish schools, offering valuable data for more accurate ground identification. Integrating sonar data with catch and trajectory data can increase the volume of reliable data, improving the overall quality of annotations.

Additionally, the precision of our method faces limitations in surpassing the precision achieved through training solely with catch data, emphasizing the importance of ensuring high precision in catch only. To achieve higher precision, additional input features, such as SSH and sea current velocity, which play a role in shaping fishing grounds, should be included. Moreover, refining the network architecture to better capture spatial dependencies remains an essential direction for future work. The resolution of the heatmaps also plays a crucial role in the performance, as fishing grounds are not fixed points but spatially distributed regions. Higher-resolution heatmaps could allow for finer localization of fishing grounds.

We evaluated the distance using the geodetic distance threshold (200 km). This was determined based on the distance that fishing vessels can travel in a day and the resolution of the SST data, but if higher resolution water temperature data can be obtained, it should be possible to evaluate it over shorter distances.

Lastly, the experimental situations, including the target fish species, target areas, and target years, are currently limited. While this study focuses on skipjack tuna as a representative case, the proposed method is not species-specific and can be extended to other fisheries with appropriate data. Future research should explore broader applications to different species and regions to further assess the generalization of the approach.

Conclusion

We proposed a method for estimating fishing grounds, which involves meta-learning for initial pre-training using fishing vessel trajectories as a form of weak supervision, followed by fine-tuning with a limited quantity of catch data. Our experimental results confirmed the effectiveness of the pre-training and meta-learning methods.

Our future work will address the identified limitations to enhance the proposed method. Specifically, we aim to augment the catch dataset by incorporating accurate sonar data from buoys and vessels to improve the accuracy of annotations. Additionally, we will enhance the precision of our method by including additional input features such as SSH and sea current velocity, optimizing the network architecture for better performance. Furthermore, we plan to evaluate the versatility of the proposed method in a more diverse range of settings, including different target fish species, areas, and years, to ensure its applicability across various scenarios.

Acknowledgments

We sincerely thank Shizuoka Prefectural Research Institute of Fishery and Ocean for providing the fish catch data essential to our research.

References

  1. 1. Chen X, Tian S, Chen Y, Liu B. A modeling approach to identify optimal habitat and suitable fishing grounds for neon flying squid (Ommostrephes bartramii) in the Northwest Pacific Ocean; 2010. Available from: https://spo.nmfs.noaa.gov/sites/default/files/pdf-content/2010/1081/chen.pdf
  2. 2. Vayghan AH, Poorbagher H, Shahraiyni HT, Fazli H, Saravi HN. Suitability indices and habitat suitability index model of Caspian kutum (Rutilus frisii kutum) in the southern Caspian Sea. Aquat Ecol. 2013;47:441–51. http://dx.doi.org/10.1007/s10452-013-9457-9
  3. 3. Mondal S, Vayghan AH, Lee MA, Wang YC, Semedi B. Habitat suitability modeling for the feeding ground of immature albacore in the Southern Indian Ocean using satellite-derived sea surface temperature and chlorophyll data. Remote Sens. 2021;13:2669–84. http://dx.doi.org/10.3390/rs13142669
  4. 4. Hsu TY, Chang Y, Lee MA, Wu RF, Hsiao SC. Predicting skipjack tuna fishing grounds in the Western and Central Pacific Ocean based on high-spatial-temporal-resolution satellite data. Remote Sens. 2021;13:861–76. http://dx.doi.org/10.3390/rs13050861
  5. 5. Zainuddin M, Saitoh K, Saitoh SI. Albacore (Thunnus alalunga) fishing ground in relation to oceanographic conditions in the western North Pacific Ocean using remotely sensed satellite data. Fish Oceanogr. 2008;17:61–73. http://dx.doi.org/10.1111/j.1365-2419.2008.00461.x
  6. 6. Mugo R, Saitoh SI, Nihira A, Kuroyama T. Habitat characteristics of skipjack tuna (Katsuwonus pelamis) in the western North Pacific: a remote sensing perspective. Fish Oceanogr. 2010;19:382–96. http://dx.doi.org/10.1111/j.1365-2419.2010.00552.x
  7. 7. Alabia ID, Saitoh SI, Mugo R, Igarashi H, Ishikawa Y, Usui N, et al. Seasonal potential fishing ground prediction of neon flying squid (Ommastrephes bartramii) in the western and central North Pacific. Fish Oceanogr. 2015;24:190–203. http://dx.doi.org/10.1111/fog.12102
  8. 8. Zainuddin M, Nelwan AFP, Farhum SA, nbsp N, Hajar MAI, Kurnia M, et al. Characterizing potential fishing zone of skipjack tuna during the southeast monsoon in the Bone Bay-flores sea using remotely sensed oceanographic data. Int J Geosci. 2013;4:259–66. http://dx.doi.org/10.4236/ijg.2013.41A023
  9. 9. Maina I, Kavadas S, Katsanevakis S, Somarakis S, Tserpes G, Georgakarakos S. A methodological approach to identify fishing grounds: a case study on Greek trawlers. Fish Res. 2016;183:326–39. http://dx.doi.org/10.1016/j.fishres.2016.06.021
  10. 10. Nurdin S, Mustapha MA, Lihan T, Zainuddin M. Applicability of remote sensing oceanographic data in the detection of potential fishing grounds of Rastrelliger kanagurta in the archipelagic waters of Spermonde, Indonesia. Fish Res. 2017;196:1–12. http://dx.doi.org/10.1016/j.fishres.2017.07.029
  11. 11. Fu A, Patil KR, Iiyama M. Region proposal and regression network for fishing spots detection from sea environment. IEEE Access. 2021;9:68366–75. http://dx.doi.org/10.1109/ACCESS.2021.3077514
  12. 12. Nakata S, Takasan K, Iiyama M. Fishing ground estimation using deep-learning-based keypoint detector. In: OCEANS 2023; 2023. p. 1–5. http://dx.doi.org/10.1109/OCEANSLimerick52467.2023.10244446
  13. 13. Takasan K, Iiyama M. Weakly-supervised keypoint detector for fishing ground estimation. In: OCEANS 2023 - MTS/IEEE U.S. Gulf Coast; 2023. p. 1–5. http://dx.doi.org/10.23919/OCEANS52994.2023.10337078
  14. 14. Dehghani M, Severyn A, Rothe S, Kamps J. Learning to learn from weak supervision by full supervision. In: NIPS Workshop on Meta-Learning (MetaLearn 2017), In: NIPS Workshop on Meta-Learning (MetaLearn 2017); 2017. p. 1–1. http://dx.doi.org/10.48550/arXiv.1711.11383
  15. 15. Mugo R, Saitoh SI. Ensemble modelling of skipjack tuna (Katsuwonus pelamis) habitats in the Western North Pacific using satellite remotely sensed data; a comparative analysis using machine-learning models. Remote Sens 2020;12(16):2591–605. http://dx.doi.org/10.3390/rs12162591
  16. 16. Fitrianah D, Praptono NH, Hidayanto AN, Arymurthy AM. Feature exploration for prediction of potential tuna fishing zones. Int J Inf Eng Electron Bus 2015;5(4):270–4. http://dx.doi.org/10.7763/IJIEE.2015.V5.543
  17. 17. Wang J, Yu W, Chen X, Lei L, Chen Y. Detection of potential fishing zones for neon flying squid based on remote-sensing data in the Northwest Pacific Ocean using an artificial neural network. Int J Remote Sens. 2015;36:3317–30. http://dx.doi.org/10.1080/01431161.2015.1042121
  18. 18. Armas E, Arancibia H, Neira S. Identification and forecast of potential fishing grounds for anchovy (Engraulis ringens) in Northern Chile using neural networks modeling. Fishes. 2022;7:204–13. http://dx.doi.org/10.3390/fishes7040204
  19. 19. Guan H, Zhao X. Study on the prediction system of shrimp field distribution in the East China Sea based on big data analysis of fishing trajectories. J Ocean Univ China. 2021;20:228–34. http://dx.doi.org/10.1007/s11802-021-4518-5
  20. 20. Iiyama M, Zhao K, Hashimoto A, Kasahara H, Minoh M. Fishing spot prediction by sea temperature pattern learning. In: 2018 OCEANS - MTS/IEEE Kobe Techno-Oceans (OTO); 2018. p. 1–4. http://dx.doi.org/10.1109/OCEANSKOBE.2018.8559299
  21. 21. Geronimo RC, Franklin EC, Brainard RE, Elvidge CD, Santos MD, Venegas R, et al. Mapping fishing activities and suitable fishing grounds using nighttime satellite images and maximum entropy modelling. Remote Sens 2018;10(10):1604–1626. http://dx.doi.org/10.3390/rs10101604
  22. 22. Kroodsma DA, Mayorga J, Hochberg T, Miller NA, Boerder K, Ferretti F, et al. Tracking the global footprint of fisheries. Science 2018;359(6378):904–08. http://dx.doi.org/10.1126/science.aao5646
  23. 23. de Souza EN, Boerder K, Matwin S, Worm B. Improving fishing pattern detection from satellite AIS using data mining and machine learning. PLoS ONE 2016;11(7):1–20. http://dx.doi.org/10.1371/journal.pone.0163760
  24. 24. Adibi P, Pranovi F, Raffaetà A, Russo E, Silvestri C, Simeoni M, et al. Predicting fishing effort and catch using semantic trajectories and machine learning. In: Tserpes K, Renso C, Matwin S (eds). In: Tserpes K, Renso C, Matwin S (eds); 2020. p. 83–83. http://dx.doi.org/10.1007/978-3-030-38081-6-7
  25. 25. Toshev A, Szegedy C. DeepPose: human pose estimation via deep neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition; 2014. pp. 1653–60. http://dx.doi.org/10.1109/CVPR.2014.214
  26. 26. Sun X, Xiao B, Wei F, Liang S, Wei Y. Integral human pose regression. In: Proceedings of the European conference on computer vision (ECCV); 2018. pp. 529–45. http://dx.doi.org/10.1007/978-3-030-01231-1-33
  27. 27. Carreira J, Agrawal P, Fragkiadaki K, Malik J. Human pose estimation with iterative error feedback. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015; pp. 4733–42. http://dx.doi.org/10.1109/CVPR.2016.512
  28. 28. Sun K, Xiao B, Liu D, Wang J. Deep high-resolution representation learning for human pose estimation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2019. pp. 5686–96. http://dx.doi.org/10.1109/CVPR.2019.00584
  29. 29. Cao Z, Hidalgo G, Simon T, Wei SE, Sheikh Y. OpenPose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans Pattern Anal Mach Intell. 2018;43:172–86. http://dx.doi.org/10.1109/TPAMI.2019.2929257
  30. 30. Osokin D. Real-time 2D multi-person pose estimation on CPU: lightweight OpenPose. In: International Conference on Pattern Recognition Applications and Methods; 2019. pp. 744–8. http://dx.doi.org/10.5220/0007555407440748
  31. 31. Xiao B, Wu H, Wei Y. Simple baselines for human pose estimation and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018. pp. 466–81. http://dx.doi.org/10.1007/978-3-030-01231-1-29
  32. 32. Luo Z, Wang Z, Huang Y, Tan T, Zhou E. Rethinking the heatmap regression for bottom-up human pose estimation. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020; pp. 13259–68. http://dx.doi.org/10.1109/CVPR46437.2021.01306
  33. 33. Zhou ZH. A brief introduction to weakly supervised learning. Natl Sci Rev. 2018;5:44–53. http://dx.doi.org/10.1093/nsr/nwx106
  34. 34. van Engelen JE, Hoos HH. A survey on semi-supervised learning. Mach Learn. 2020;109:373–440. http://dx.doi.org/10.1007/s10994-019-05855-6
  35. 35. Song H, Kim M, Park D, Shin Y, Lee JG. Learning from noisy labels with deep neural networks: a survey. IEEE Trans Neural Netw Learn Syst 2023;34(11):8135–53. http://dx.doi.org/10.1109/TNNLS.2022.3152527
  36. 36. Oquab M, Bottou L, Laptev I, Sivic J. Is object localization for free? - Weakly-supervised learning with convolutional neural networks. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2015. pp. 685–94. http://dx.doi.org/10.1109/CVPR.2015.7298668
  37. 37. Ryou S, Perona P. Weakly supervised keypoint discovery. ArXiv. preprint. 2021;abs/2109.13423:1–14. http://dx.doi.org/10.48550/arXiv.2109.13423
  38. 38. Han J, Zhang D, Cheng G, Guo L, Ren J. Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning. IEEE Trans. Geosci. Remote Sens 2014;53(6):3325–37. http://dx.doi.org/10.1109/TGRS.2014.2374218
  39. 39. Zhang H, Chen F, Shen Z, Hao Q, Zhu C, Savvides M. Solving missing-annotation object detection with background recalibration loss. In: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, In: ICASSP 2020 - 2020 IEEE International Conference on Acoustics; 2020. pp. 1888–1888. http://dx.doi.org/10.1109/ICASSP40776.2020.9053738
  40. 40. Wu Z, Bodla N, Singh B, Najibi M, Chellappa R, Davis LS. Soft sampling for robust object detection. In: British Machine Vision Conference 2019; 2019. pp. 225–36. http://dx.doi.org/10.48550/arXiv.1806.06986
  41. 41. Wang T, Yang T, Cao J, Zhang X. Co-mining: self-supervised learning for sparsely annotated object detection. Proc AAAI Conf Artif Intell 2021;35(4):2800–8. http://dx.doi.org/10.48550/arXiv.2012.01950
  42. 42. Jia C, Yang Y, Xia Y, Chen YT, Parekh Z, Pham H, et al. Scaling up visual and vision-language representation learning with noisy text supervision. In: Meila M, Zhang T, (eds). Proceedings of the 38th International Conference on Machine Learning. Proc Mach Learn Res. 2021;139:4904–4916. Available from: https://proceedings.mlr.press/v139/jia21b.html
  43. 43. Joulin A, van der Maaten L, Jabri A, Vasilache N. Learning visual features from large weakly supervised data. In: Leibe B, Matas J, Sebe N, Welling M (eds). Computer vision – ECCV 2016. Cham: Springer; 2016. pp. 67–84. http://dx.doi.org/10.1007/978-3-319-46478-7-5
  44. 44. Dehghani M, Zamani H, Severyn A, Kamps J, Croft WB. Neural ranking models with weak supervision. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR ’17. New York, In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR ’17. New York; 2017. pp. 65–65. http://dx.doi.org/10.1145/3077136.3080832
  45. 45. Zhang D, Han J, Cheng G, Yang MH. Weakly supervised object localization and detection: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021;44:5866–85. http://dx.doi.org/10.1109/TPAMI.2021.3074313
  46. 46. Andrychowicz M, Denil M, Gómez S, Hoffman MW, Pfau D, Schaul T, et al. Learning to learn by gradient descent by gradient descent. In: Lee D, In: Lee D; 2016. pp. 1–1. http://dx.doi.org/10.5555/3157382.3157543
  47. 47. Huang B, Liu C, Banzon VF, Freeman E, Graham G, Hankins B, et al. NOAA 0.25-degree Daily Optimum Interpolation Sea Surface Temperature (OISST), Version 2.1; 2020. Available from: https://doi.org/10.25921/RE9P-PT57 [cited May 19, 2023].