Enhanced human pose estimation using YOLOv8 with Integrated SimDLKA attention mechanism and DCIOU loss function: Analysis of human body behavior and posture

doi:10.1371/journal.pone.0318578

Fig 1.

The network architecture of YOLOv8 comprises three parts.

The Backbone is responsible for feature extraction. The Neck, situated between the Backbone and the Head, is responsible for feature fusion. The Head is responsible for outputting the detection results.

More »

Expand

Fig 2.

The decomposition diagram of the large kernel convolution, where the blue grid represents the convolution kernel and the green grid represents the center point.

The large kernel convolution in the figure is decomposed into a depth convolution, a depth dilution convolution and a point convolution.

More »

Expand

Fig 3.

Details the DLKA structure, which includes the DLKA-Attention module and the FFN module. The DLKA-Attention module mainly consists of the DLKA module, which comprises deformable depthwise convolutions and deformable dilated convolutions.

The FFN module is composed of deformable convolutions.

More »

Expand

Fig 4.

Illustrates the structure of the C2F-SimDLKA module. After processing with CBS, the features are first split into two parts: one part is retained without any processing, and the other part is processed through several SimDLKA modules.

Each SimDLKA module splits into two channels: one channel passes the processed features to the next SimDLKA module, while the other channel retains the features for later concatenation. Finally, after passing through n SimDLKA modules, all features are fused together.

More »

Expand

Fig 5.

Comparison of the curves of the DCIOU loss function and the CIOU loss function.

More »

Expand

Table 1.

Ap values of different methods on the three datasets. AP values include four indicators: AP@50, AP@75, AP@M, and AP@L.

More »

Expand

Table 2.

FPS values of different methods on the three datasets.

More »

Expand

Fig 6.

AP values of different methods on the three datasets, including four indicators: AP@50, AP@75, AP@M, and AP@L.

Blue represents AP@50, orange represents AP@75, green represents AP@M, and red represents AP@L.

More »

Expand

Fig 7.

FPS values of different methods on the three datasets, where red represents the COCO dataset, yellow represents the MPII dataset, and green represents the HP dataset.

More »

Expand

Table 3.

AP values of three methods on the three datasets, including four indicators: AP @50, AP @75, AP @M, and AP @L. Experiment one is the baseline model without adding other modules, experiment two is the baseline model with the LKA module added, and experiment three is the baseline model with the SimDLKA module added.

More »

Expand

Table 4.

AP values of three loss functions on the three datasets, including four indicators: AP@50, AP@75, AP@M, and AP@L.

More »

Expand