A dataset of human and robot approach behaviors into small free-standing conversational groups

The analysis and simulation of the interactions that occur in group situations is important when humans and artificial agents, physical or virtual, must coordinate when inhabiting similar spaces or even collaborate, as in the case of human-robot teams. Artificial systems should adapt to the natural interfaces of humans rather than the other way around. Such systems should be sensitive to human behaviors, which are often social in nature, and account for human capabilities when planning their own behaviors. A limiting factor relates to our understanding of how humans behave with respect to each other and with artificial embodiments, such as robots. To this end, we present CongreG8 (pronounced ‘con-gre-gate’), a novel dataset containing the full-body motions of free-standing conversational groups of three humans and a newcomer that approaches the groups with the intent of joining them. The aim has been to collect an accurate and detailed set of positioning, orienting and full-body behaviors when a newcomer approaches and joins a small group. The dataset contains trials from human and robot newcomers. Additionally, it includes questionnaires about the personality of participants (BFI-10), their perception of robots (Godspeed), and custom human/robot interaction questions. An overview and analysis of the dataset is also provided, which suggests that human groups are more likely to alter their configuration to accommodate a human newcomer than a robot newcomer. We conclude by providing three use cases that the dataset has already been applied to in the domains of behavior detection and generation in real and virtual environments. A sample of the CongreG8 dataset is available at https://zenodo.org/record/4537811.

the future to include other group types and activities that may not share the same characteristics as those in the current dataset. This also includes additional HRI data collection activities in which the robot trajectories will be data-driven based on information from this dataset. The application of the dataset, and its scope, have been demonstrated in the use cases in references [39] and [52].
Comment: "The number of participants/groups is limited (40 participants/10 groups only).
Approaching behavior has been captured for the same group several times, which introduces its own biases as after a couple interactions the participants will already become acquainted."

Response:
We randomly assigned participants to each participant pool (i.e. group of four participants with revolving roles between game payer and adjudicator) to attempt to reduce the effects of acquaintance existing prior to the study. However, it is correct that in any interaction like this, participants may become acquainted over the course of the interaction. To check for this, we have included an analysis (see S2 Table and Fig. 14) and text that suggests Accommodate behaviors did not increase as the experiment progressed.
Comment: "It is not clear from the paper what is aimed to be captured. If they aim to capture the tendency of a group of people to accept a newcomer, is the designed experiment appropriate for this aim?" The aim has been to collect an accurate and detailed set of behaviors related to small group interactions, especially data related to positioning, orienting and full-body motions as a newcomer approaches and joins the group. In the coffee break or poster session scenarios captured in previous datasets [7-9], for example, the joining group behaviors are very rare, even over long collection periods. Our scenario attempts to model a practical, real-world situation, in the sense that group members are engaged in a conversation and do not know when and where the newcomer will approach to join the group, while at the same time enabling as many samples as possible of joining behaviors to be collected. We have clarified this aim in the abstract of the paper and the motivation for the scenario design in the last paragraph of the "Data collection scenario" section.
Comment: "The paper does not provide further insight into why humans prefer a human newcomer over a robot newcomer. This might be due to the robot's limited capabilities for maintaining interaction during the game. The description of the robot control is very brief in the paper, and it is not clear what behaviour is automatic/what behaviour is controlled by the human operator." Response: According to some participants, they were not familiar with robots and were afraid the robot might collide with them. A WoZ approach was chosen since there is no good automatic control system for moving a robot into an appropriate position in a dynamic group situation. In WoZ, the robot's position is controlled by the teleoperator, and it results in non-smooth trajectories.
CongreG8's main purpose is to provide data of human group approach interactions as a basis for training machine learning models for robots approaching groups of humans in a socially compliant manner and then to replace the WoZ control. We have extended the description of robot control in the "Robot control" section of the paper.
Comment: "The dataset mainly focuses on motion capture data. What are practicalities of motion capture data in real-life scenarios, where it is not possible to ask humans to wear mobcap suits?
Also, how can this approach be implemented on a robot to perceive and approach a group?" Response: Compared with extracting skeleton data from videos, we choose full-body motion capture as it offers many accurate data with less occlusion and higher continuity. The high-quality CongreG8 dataset is useful in training data-driven models. Once the model is trained, it could be applied to video inputs, and skeleton extraction methods could be used to extract markers from videos before inputting to the model. CongreG8 additionally offers video streams if researchers prefer to train models using videos.
When it comes to how it could be implemented in a robot to perceive the group, we have demonstrated an approach in [52] in which the full-body marker data (from surrounding cameras) are input into the trained model in real-time. In the future, group information could potentially be extracted from the robot's camera in real-time, although there are several additional challenges associated with doing that.
Comment: "The authors should include a section summarizing the data statistics. It would be helpful for the reader to see some visualizations of how group use the space, how much they move, how they move when a newcomer joins the group, etc. Further statistics regarding the distribution of labels in particular, personality, accommodate and ignore should be presented in the paper."

Response:
We have included a new table (S1 Table) in the supporting documentation that summarizes the statistic of the overall dataset. Statistics regarding the distribution of labels and visualizations of groups are added in the paper, including a new figure ( Figure 10) with a heatmap to visualize the positions of all group members in the across all trials in the dataset, in addition to sample images corresponding full body behaviors and newcomer trajectories. As mentioned in a previous response, we also added a new table (S2 Table) in the supporting documentation to summarize the distribution of personality and group behavior labels, in addition to a new figure ( Figure 14) to show the relationship between Accommodation labels and personality.
Comment: "The authors should discuss the following datasets/papers in the related work section: • JRDB: A Dataset and Benchmark for Visual Perception for Navigation in Human Environments Response: Thank you for the references. They have been added to the paper in addition to some explanatory text in the "Group interaction datasets" section of the paper.

Reviewer #2:
Comment: "The related work lists the relevant literature.
There are a few papers that the authors could add to the "Group interaction research" section if they believe it adds to their paper: …"