A deep learning-based model for automatic identification of mesopelagic organisms from in-trawl cameras | PLOS One

Advertisement

Browse Subject Areas

?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Table 1 — Table 1.

In-trawl camera data included in this study. Period: sampling month and year; area: area the survey covered; trawl: types of pelagic trawls used ranging from macroplankton trawl to fish trawls (VITO, Harstad); gain: gain of the camera; frames per second: frame-rate at which images were recorded; resolution: image width and height in pixels; light: colour of the artificial light source.

More »

Fig 1 — Fig 1.

Study area.
Map of the Norwegian Sea featuring locations of pelagic hauls (circles) used for the development of an object-detection model for mesopelagic organisms from in-trawl camera images. Circles are coloured based on image type, according to the lighting and camera gain used during collection: white, red gain 1.5, and red gain 5. Inset: Map of West-Norwegian fjords. The map was produced in R v. 4.5.2 [57], using the ggOceanMaps package v. 2.2.0 [58].

More »

Fig 2 — Fig 2.

Mounting of the Deep Vision in-trawl camera system inside three models of pelagic trawls.
The stretched circumference is 280, 397, and 320 m for the Macroplankton, VITO, and Harstad trawl, respectively. The Deep Vision section consists of two 3 m long sections that guide the organisms inside the imaging chamber of the 1.5 m long in-trawl camera system. The length and stretched mesh size are specified for the trawl body and codend of each of the three pelagic trawl models.

More »

Fig 3 — Fig 3.

Intensity of light emitted by the Deep Vision in-trawl camera across wavelengths with white (grey curve) and red light (red curve).
Dashed lines indicate peak sensitivity of Maurolicus muelleri (λmax 441 nm; see de Busserolles [53]) and Benthosema suborbitale (λmax 487 nm, see Douglas and Patridge [60]). Based on visual properties, B. suborbitale is the nearest relative to the dominant lanternfish B. glaciale for which light sensitivity data were available (see Fig 15 in de Busserolles [52]). The figure was produced in R v. 4.5.2 [57].

More »

Fig 4 — Fig 4.

Detailed anatomy of the Deep Vision during 2022–2024, 2021 and 2020: (a) imaging chamber, (b) divider to reduce the volume of the imaging chamber (2021 only), (c) two stroboscopic lights, (d) stereo camera, and (e) battery.
Image processing steps applied after collection using the “Deep Vision HMI” software (Scantrol Deep Vision AS, Bergen, Norway): geometric rectification to facilitate length measurements from images (final images with rounded sides); colour-correction applied to all images collected after the year 2020. Examples of the types of images used for manual annotation.

More »

Table 2 — Table 2.

Acronyms and definitions of machine learning terms used throughout the manuscript. For a more detailed explanation of the performance evaluation metrics, refer to [68,69].

More »

Table 3 — Table 3.

Annotations per object class (species groups) for both manually annotated, non-colour corrected, and synthetic datasets. The manually annotated datasets include colour-corrected white (W), colour-corrected red gain 1.5 (R1.5), and red gain 5 (R5) images indicated in bold. For each manually annotated dataset, the total number and proportion of annotations, as well as the numbers of annotations used for training (tr), validation (va), and testing (te), are provided. The non-colour corrected white (Wn), non-colour corrected red gain 1.5 (R1.5n), and synthetic red gain 5 (R5s) images were used to augment the number of annotations during training and were always used in combination with manual annotations (W_tr, R1.5_tr, R5_tr). The last row displays the total number of images annotated for each dataset.

More »

Table 4 — Table 4.

Composition of the annotation dataset across training (tr), validation (va) and test (te). In total, this study tested nine different training sets of varying sizes (W_tr, R1.5_tr, R5_tr, Wn_tr, R1.5n_tr, R5s_tr, WR_tr, WRn_tr, WRns_tr). Validation (W_va, R1.5_va, R5_va, WR_va) and test sets (W_te, R1.5_te, R5_te) consisted of only manually annotated images. The number of manual, non-colour corrected, and synthetic annotations for each object class can be found in Table 3.

More »

Table 5 — Table 5.

Definitions of causes for false positive (FP) and false negative (FN) detections used in the error analysis. Cropped images from each test set to illustrate these causes can be found in S6 Fig.

More »

Fig 5 — Fig 5.

Effect of training set size and image resolution on weighted mean average precision (weighted mAP).
(A) Effect of training set size on the weighted mAP of the mesopelagic detector at a constant image resolution of 1216 pixels. (B) Effect of image resolution (image width in pixels) on the weighted mAP of the mesopelagic detector for the model runs trained on WRns_tr. The composition of each training set is described in Table 3 and Table 4. Performance was evaluated separately for each test set: white (W_te), red gain 1.5 (R1.5_te), red gain 5 (R5_te). The figure was produced in R v. 4.5.2 [57].

More »

Fig 6 — Fig 6.

Effect of training set size and image resolution on average precision (AP) of each object class.
(A) Effect of training set size on the AP of each object class at a constant image resolution of 1216 pixels. (B) Effect of image resolution (image width in pixels) on the AP of each object class for the model runs trained on WRns_tr. Data points are coloured based on the object class and shaped according to higher taxonomic groups (triangle: fishes, diamond: crustaceans, circle: other). The composition of each training set is described in Table 3 and Table 4. Performance was evaluated separately for each test set: white (W_te), red gain 1.5 (R1.5_te), red gain 5 (R5_te). The figure was produced in R v. 4.5.2 [57].

More »

Fig 7 — Fig 7.

Cropped images from each test set to illustrate the effect of training set size and image resolution: white (W_te, left column), red gain 1.5 (R1.5te, middle column), and red gain 5 (R5_te, right column).
Images are overlain with manual annotations (black) and automatic detections from model runs with varying training set size and image resolutions (coloured and labelled by object class). Columns: smallest training sets (W_tr, R1.5_tr, R5_tr) and largest training set (WRns_tr). Rows: minimum (640 pixels) and maximum (1216 pixels) image resolution. The composition of each training set is described in Table 3 and Table 4.

More »

Fig 8 — Fig 8.

Weighted mean Average Precision (weighted mAP) and Average Precision (AP) of each object class for the best-performing model (training set: WRns_tr, image width: 1216 pixels).
The model was tested separately on white (W_te, grey), red gain 1.5 (R1.5_te, dark red) or red gain 5 (R5_te, light red). The training and test sets are described in Table 3 and Table 4. The AP of all object classes were used to calculate the mAP. However, the object classes with a grey background were excluded from the analysis since they were present in only one of the test sets or had too few annotations. The figure was produced in R v. 4.5.2 [57].

More »

Fig 9 — Fig 9.

F1-confidence curves of each object class for the best-performing model (training set: WRns_tr, image width: 1216 pixels).
The model was tested separately on white (W_te, grey), red gain 1.5 (R1.5_te, dark red) or red gain 5 (R5_te, light red). The dashed line represents a confidence threshold of 0.05, which was the value used at evaluation. The solid line represents a confidence threshold of 0.5 and serves as a reference point. The object classes with a grey background were excluded from the analysis since they were present in only one of the test sets or had too few annotations. The figure was produced in R v. 4.5.2 [57].

More »

Fig 10 — Fig 10.

Confusion matrices of the best-performing model (training set: WRns_tr, image width: 1216 pixels).
The matrices were generated at the default confidence threshold of 0.25 and IoU threshold of 0.5 for each test set: white (W_te), red gain 1.5 (R1.5_te), red gain 5 (R5_te). Object classes in bold are the focus of this analysis, as they were present in at least two test datasets and had sufficient annotations. Missed: annotations in the test set that the model did not detect. Background: model detections that were not annotated. The figure was produced in R v. 4.5.2 [57].

More »

Fig 11 — Fig 11.

Analysis of false positive (FP) and false negative (FN) detections of krill, pelagic shrimp, and gelatinous zooplankton for the three test sets: white (W_te), red gain 1.5 (R1.5_te), red gain 5 (R5_te).
The detections were produced by running the best-performing model (training set: WRns_tr, image width: 1216 pixels) and applying a confidence threshold of 0.05 and an IoU threshold of 0.5 on the images of the three test sets. Annotations are human labels assigned to each object in an image. Up to 50 FPs and FNs were randomly subsampled from each test set. For each FP or FN bounding box, the immediate cause of the error was noted, as well as its visible morphological features, overlap with other objects, and the degree of contrast with the background. Values are provided in proportions, and the number of bounding boxes is in parentheses. For krill in the R5_te test set, only 10 FN annotations were available; therefore, the proportions should be treated with caution. Due to misclassification, some visible morphological features for krill and pelagic shrimp include pigmented, partially pigmented, and transparent, which were only used for gelatinous zooplankton and vice versa. The figure was produced in R v. 4.5.2 [57].

More »