Fig 1.
System architecture and information refinement.
Gray boxes and arrows indicate system components and their interaction. The rounded rectangles illustrate the hardware platform. Starting from the on-body sensor network, the information is gradually refined from sensor- over object- to action-level information (red italics labels). Finally, it is converted into instructions and messages that are comprehensible for the user.
Fig 2.
Industrially motivated test workflows.
From left to right: Nails & Screws, Labeling & Packaging, Ball valve.
Table 1.
The complexity of the datasets increases from (a)–(c).
Fig 3.
Hardware platform and sensing concept.
The hardware platform consists of the BSN and the HMD, which represent the major input and output devices of the system. Left: a schematic drawing of the BSN. Right: a user wearing the setup while performing the Ball valve workflow.
Fig 4.
Example camera frames from chest and overhead camera.
Left: the chest-mounted fisheye camera provides a good overview of the workspace where the user’s wrists (marked with green circles) are visible even if the arms are in a starting position. Right: the overhead RGBD sensor provides a narrower top view. The field of view covered with the RGBD sensor is indicated in the left lower fisheye view. Combining both views, the user activity can always be roughly tracked, while more detailed tracking is available during workspace interactions.
Fig 5.
Functional upper body model with segment frames (left), technical sensor frames (right), and DOF in parenthesis.
In the nominal pose, all joints are aligned with the global body-centered frame of reference, G, which is position-less, aligned with gravity and assumed at rest. Sensor poses are given relative to the associated joint.
Fig 6.
Workspace characterization and monitoring workflow.
The workspace characterization and monitoring prototype allows building a map of the 3D environment, tracking an RGBD sensor using dense modeling of the workspace in terms of both depth and appearance, and segmenting outliers as foreground indicating the presence of new objects. Known objects are then recognized as well as tracked in 3D as tasks are executed.
Fig 7.
Overview of workflow recovery and monitoring framework.
From bottom to top: In step 1, object-wrist and user motion relational graph structures are generated from the scene and user monitoring information. In step 2, a histogram representation of bag-of-relations (BoR) over a sliding window is created from the generated relational graph structures. These BoR are the feature input for the recognition of atomic events during workflow monitoring (step 3).
Fig 8.
Spatiotemporal relational word.
Left: spatiotemporal relations (2D vector) extracted from the observed configuration of objects and wrist 3D positions in the workspace. These relations (edges in the graph) are of the same color. Right: quantized relations using the relational dictionary corresponding to a particular object-object and object-wrist pair. Different colors are assigned to different relations (edges).
Fig 9.
Encoding functional wrist-object relationships.
Relations are accumulated separately for each pair of classes.
Fig 10.
Overlays visualized in the HMD.
Textual descriptions provide information on the current action and its position in the overall workflow. Relevant directions are indicated by 3D arrows. Relevant objects are highlighted by red circles (regions of interest). The yellow arrow at the right margin of the middle picture directs the attention of the user to the relevant part of the workspace, where in this case the electrical positioner is located and highlighted by a region of interest.
Fig 11.
Workflow learning and online monitoring and assistance system.
The left side of the diagram shows the pipeline for workflow learning. The right side shows the online monitoring and assistance system with the real-time interfaces between the building blocks. Here, the rounded rectangles mark the hardware components. BSN and low-level processing components feed into both learning and online monitoring.
Fig 12.
Images of the live monitoring system.
Fig 13.
BoR calculation with subdivided sliding windows.
In order to capture the temporal relations of before, during and after, the histogram representing the BoR is subdivided into three histograms representing the respective temporal relations.
Table 2.
Window-classification workflow monitoring performance [%].
The sliding window based average performance comparison (including standard deviations) using only SVM is shown for the leave-one-subject-out experiment. The proposed bag-of-relations (BoR), in different variants, is compared with the state-of-the-art image-based STIP (Space-Time Interest Points) features.
Table 3.
On-line workflow monitoring performance [%].
The sliding window based average on-line performance comparison (including standard deviations) is shown for the leave-one-subject-out experiment. The proposed bag-of-relations (BoR), in different variants, is compared with the state-of-the-art image-based STIP (Space-Time Interest Points) features.
Table 4.
Off-line workflow monitoring performance [%].
The sliding window based average off-line performance comparison (including standard deviations) is shown for the leave-one-subject-out experiment. The proposed bag-of-relations (BoR), in different variants, is compared with the state-of-the-art image-based STIP (Space-Time Interest Points) features.
Fig 14.
Predicted atomic events vs. ground-truth for the leave-one-subject-out evaluation of the Labeling & Packaging dataset.
Each figure represents the predictions of the atomic events in sequences belonging to the left-out subject (left to right, top to bottom: subject 1,2,3,4). The bottom bars show the ground truth and the top bars show the prediction. The vertical lines separate two consecutive workflow sequences. Different colors indicate different atomic events.
Fig 15.
On-line workflow monitoring confusion matrices for leave-one-subject-out evaluation.
Columns represent the predicted atomic events, while rows represent the actual atomic events. The gray scale value encodes the positive rate from 0 to 1. Left to right: Nails & Screws, Labeling & Packaging, Ball valve.
Fig 16.
The plots illustrate the magnetic disturbances occurring during the Labeling & Packaging and the Nails & Screws workflow. In a disturbance-free environment, the norm stays constant under arbitrary poses and motions. While the changes are moderate for Labeling & Packaging, significant changes are noticeable for Nails & Screws, especially in the forearm IMUs (RF IMU, LF IMU). This happens primarily in situations, where the hands approach or handle the hammer or electrical screwdriver. The three chest camera frames correspond to the peak disturbances at second 11, 20, and 52 of Nails & Screws.
Fig 17.
Effects of magnetic disturbances.
The plot illustrates the deviation between the 3D wrist position estimates resulting from the magnetometer based and the wrist detection based approach. The data corresponds to the right wrist position estimated in a disturbance-free environment (Labeling & Packaging) and in a disturbed environment (Nails & Screws) for exemplary data sequences.
Fig 18.
Effects of magnetic disturbances in the image plane.
The plot shows the statistics of the deviations between the projected and the detected wrist positions resulting from the magnetometer and the wrist detection based approach. The data corresponds to the right wrist position estimated in an exemplary data sequence of the Nails & Screws workflow. Clearly, the wrist based method outperforms the magnetometer based method.
Fig 19.
Effects of magnetic disturbances in the image plane.
The picture shows example views from the chest camera augmented with the projected wrist position estimates (ellipses) resulting from the magnetometer based (upper row) and the wrist detection based (lower row) approach during disturbances by handling the hammer and the electrical screwdriver. The ellipses indicate the projected uncertainties. The magnetometer based approach results in clear deviations, while the wrist detection based approach shows good matches.
Fig 20.
Object recognition and tracking for Labeling & Packaging.
Two frames showing ground truth bounding boxes and corresponding output of the method projected onto 2D images for the Labeling & Packaging dataset. Matching colors indicate correct identities. In the bottom right, the tape dispenser was not ground truth labeled (false-positive), and the pen not recognized resulting in a false negative.
Fig 21.
Object recognition and tracking for Ball valve.
Frames showing the Ball valve task for 6 sequences from 3 different operators across the task’s primitive events. False negative frames (yellow-bounded) indicate cases where the object failed to be recognized due to a significantly different grasp between learning and testing, or due to occlusion. False positive cases (red-bounded) are due to ambiguous grasps of objects, particularly the spanner and the screw driver, when seen from an overhead camera.
Table 5.
Object recognition and tracking results for the Labeling & Packaging task.
The average precision and recall are shown from 2 runs of each worker.
Fig 22.
As the number of objects increases from 1 to 30, the library size increases by more than 150×, while the average detection time increases by 3.3× (plier), 4.8× (claw) and 5.5× (charger). For the object ‘tape’ (right), the average detection time increases by 10×, particularly when objects with a circular shape are learnt (headphone, mug, apple, and scissors).
Fig 23.
Confusion matrix showing the accuracy when objects are learnt from a different operator for the Nails & Screws (left) and the Labeling & Packaging (right) tasks.