Fig 1.
The above schema presents the proposed framework and the process to obtain from video sequences the discovered motion primitives.
Fig 2.
The six groups partitioning the human body with respect to motion primitives are shown, together with the joints specifying each group and the skeleton hierarchy inside each group: Joints in yellow are the parent joints in the skeleton hierarchy.
Fig 3.
Sequences of joint positions, for each skeleton group, after the root-sequence normalization described in Section 3.
Position data are in cm. The green points show the most internal group joint data (e.g. the hip for the leg); the yellow points show the intermediate group joint data (e.g. the knee for the leg); the red points show the most external group joint data (e.g. the ankle for the leg). The joints data are collected from the datasets described in Section 6.
Fig 4.
Overview of motion primitive discovery and recognition framework.
Primitives of the group ‘Arm’ from six different categories are shown on the left. Primitives are discovered by maximizing the motion flux energy function, presented here on the left side of the colored bar, though deprived of velocity and length components. These sets of primitives are used to train the hierarchical models for each category. Primitives are then recognized according to the learned models. The recognized motion primitive categories are depicted with different colors. On the right, the group motion in the corresponding interval is shown.
Fig 5.
Left: Motion flux of three motion primitives of group G3 labeled as ‘Elbow Flexion’, discovered from video sequences taken from the ActivityNet dataset. Right: Motion primitives before and after the normalization, for clarity only the curve of the out most joint is shown.
Fig 6.
Number k of components for groups G1, G2 and G3.
Values of k are computed adjusting α so as to maximize the posterior p(α, Gm), given the data, namely the sampled primitives in the groups.
Fig 7.
Transposed feature vector of 3 contiguous sampled points on the decimated trajectory.
Table 1.
Average Hausdorff distance to each class representative in G2.
Fig 8.
Manifold generated by a component of the DPM model for Elbow flexion on the left and from a component of Shoulder abduction on the right.
Fig 9.
Total number of discovered primitives for each group for the five most general categories of the ActivityNet dataset.
Clock-wise from top-left: Eating and drinking Activities; Sports, Exercise, and Recreation; Socializing, Relaxing, and Leisure; Personal Care; Household Activities. Each color corresponds to a different group following the convention of Fig 12. Note: Axes scale is shared among the plots.
Table 2.
Total number of unlabeled primitives discovered for each group using the motion flux on the reference datasets.
Fig 10.
Example of synthetic motion primitive, specifically right arm Shoulder abduction (first row) and Elbow flexion (second row), left leg Hip abduction (third row) and Knee flexion (fourth row).
For each synthetic motion primitive the four imaged poses match four representative poses extracted from the animation of the aforementioned primitive.
Table 3.
Accuracy of discovered primitive endpoints (in number of frames).
Fig 11.
Arc length distribution of original and scaled primitives of a specific category for group G1 (left) and G4 (right).
The first box in each box plot, corresponds to the original arc length distribution, the next four are the arc length distributions obtained scaling the primitives original data using the detailed scaling factors. Each box indicates the inner 50th percentile of the trajectory data, top and bottom of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data points not considered outliers, crosses are the outliers.
Fig 12.
Diagram showing the motion primitives of each group.
Abbreviation ext stands for external, int for internal, rot for rotation, exten for extension, and flex for flexion.
Fig 13.
Confusion matrices for motion primitive recognition.
The matrices for G1 and G2 are shown at the top, G3 and G4 at the middle, while G5 and G6 are shown at the bottom.
Table 4.
Primitive recognition accuracy and ablation study.
Fig 14.
Distribution of the 69 primitives for the five most general categories of the ActivityNet dataset.
Clock-wise from top-left: Eating and drinking Activities; Sports, Exercise, and Recreation; Socializing, Relaxing, and Leisure; Personal Care; Household Activities. Each color corresponds to a different group following the convention of Fig 12.
Table 5.
Comparison with the 22 motion primitives of [3].
Table 6.
Datasets for primitive computation in dangerous behaviors detection.
Fig 15.
Results of the proposed method on videos from UCF-crime dataset.
From top: Abuse, Fighting. Colored window shows ground truth anomalous region.
Fig 16.
Results of the proposed method on videos from UCF-crime dataset.
From top: Shooting, Normal. Colored window shows ground truth anomalous region.
Fig 17.
Instances of videos with human meshes fitted using HMR from Hockey and Movies datasets [83].
Fig 18.
Frequency graphs of the occurrences of primitives for groups G2 (torso) and G3 (right arm) in the videos of Abuse, Fighting, Robbery, and Shooting of the dataset UCF-crime.
Fig 19.
ROC curves of the proposed method for UFC-crime, UFC101, Hockey and Movies datasets.
Table 7.
Comparison with state-of-the-art methods on the datasets Movies, UCF101 and Hockey.
Table 8.
AUC comparison with state-of-the-art methods on the UCF-crime dataset.