Fig 1.
The network architecture of YOLOv8 comprises three parts.
The Backbone is responsible for feature extraction. The Neck, situated between the Backbone and the Head, is responsible for feature fusion. The Head is responsible for outputting the detection results.
Fig 2.
The decomposition diagram of the large kernel convolution, where the blue grid represents the convolution kernel and the green grid represents the center point.
The large kernel convolution in the figure is decomposed into a depth convolution, a depth dilution convolution and a point convolution.
Fig 3.
Details the DLKA structure, which includes the DLKA-Attention module and the FFN module. The DLKA-Attention module mainly consists of the DLKA module, which comprises deformable depthwise convolutions and deformable dilated convolutions.
The FFN module is composed of deformable convolutions.
Fig 4.
Illustrates the structure of the C2F-SimDLKA module. After processing with CBS, the features are first split into two parts: one part is retained without any processing, and the other part is processed through several SimDLKA modules.
Each SimDLKA module splits into two channels: one channel passes the processed features to the next SimDLKA module, while the other channel retains the features for later concatenation. Finally, after passing through n SimDLKA modules, all features are fused together.
Fig 5.
Comparison of the curves of the DCIOU loss function and the CIOU loss function.
Table 1.
Ap values of different methods on the three datasets. AP values include four indicators: AP@50, AP@75, AP@M, and AP@L.
Table 2.
FPS values of different methods on the three datasets.
Fig 6.
AP values of different methods on the three datasets, including four indicators: AP@50, AP@75, AP@M, and AP@L.
Blue represents AP@50, orange represents AP@75, green represents AP@M, and red represents AP@L.
Fig 7.
FPS values of different methods on the three datasets, where red represents the COCO dataset, yellow represents the MPII dataset, and green represents the HP dataset.
Table 3.
AP values of three methods on the three datasets, including four indicators: AP @50, AP @75, AP @M, and AP @L. Experiment one is the baseline model without adding other modules, experiment two is the baseline model with the LKA module added, and experiment three is the baseline model with the SimDLKA module added.
Table 4.
AP values of three loss functions on the three datasets, including four indicators: AP@50, AP@75, AP@M, and AP@L.