Task dynamics define the contextual emergence of human corralling behaviors

Social animals have the remarkable ability to organize into collectives to achieve goals unobtainable to individual members. Equally striking is the observation that despite differences in perceptual-motor capabilities, different animals often exhibit qualitatively similar collective states of organization and coordination. Such qualitative similarities can be seen in corralling behaviors involving the encirclement of prey that are observed, for example, during collaborative hunting amongst several apex predator species living in disparate environments. Similar encirclement behaviors are also displayed by human participants in a collaborative problem-solving task involving the herding and containment of evasive artificial agents. Inspired by the functional similarities in this behavior across humans and non-human systems, this paper investigated whether the containment strategies displayed by humans emerge as a function of the task’s underlying dynamics, which shape patterns of goal-directed corralling more generally. This hypothesis was tested by comparing the strategies naïve human dyads adopt during the containment of a set of evasive artificial agents across two disparate task contexts. Despite the different movement types (manual manipulation or locomotion) required in the different task contexts, the behaviors that humans display can be predicted as emergent properties of the same underlying task-dynamic model.


Introduction
Social animals have the extraordinary capacity to structure their activity in coordination with other members of a larger group. The resultant behaviors that emerge at the collective level display key features of self-organizing systems [1], whereby interactions among individuals give rise to functionally organized, coordinated behavioral patterns. These patterns show a remarkable degree of qualitative similarity across species [2][3][4][5][6]. For example, similar herding and containment (i.e., corralling) behavior has been observed during group hunting by wolves [7] and certain cetaceans [8][9][10]  agents acting in different environment and task contexts exploit similar dynamical rules to achieve task success.

Background
Corralling behaviors in humans. Previous research has utilized corralling task paradigms in a virtual reality framework to explore human dyadic coordination and problem-solving [11,12,17,18]. The task was presented as a video game displayed on a large tabletop display (see Fig 1,left). Standing on opposite sides of the display, participants controlled herding agents (HAs) which would repel nearby TAs. The goal was to keep the TAs from fleeing the game field by containing them within a red circular region during one-minute trials (see [11] for more details, as well as recent implementations [12,17]). When left unperturbed, the TAs exhibited Brownian motion which required active movements by participants to keep them contained within the red region (else they would easily disperse).
While completing the task, dyads engaged in a behavior termed search and recover (S&R) (see Fig 1,center), which involved dyads subdividing the game field (global 'task-space') in half and each participant selecting and retrieving the TA farthest from the containment location on their side of the task space. Although this S&R strategy can result in herding all TAs into the containment region, the evasive nature of the autonomous TAs made containing them within this region quite difficult, if not impossible, when the number of TAs increased (e.g., from 3 to 7), with most pairs failing to contain the TA herd using only S&R behavior. Accordingly, some dyads discovered and adopted a much more efficient and effective encirclement strategy to contain the TA herd once the TAs were initially corralled into the containment region using S&R. Instead of each participant pursuing and retrieving individual TAs that escaped the containment perimeter on their respective sides of the task space, both participants would perform coupled, oscillatory movements along their respective half-perimeter (see Fig 1,right). Once discovered, this behavioral containment strategy (coupled oscillatory containment [COC]) was immediately implemented by pairs in subsequent trials and led to near-optimal task performance [11]. Moreover, in dyads that discovered the COC strategy, if one participant was left to contain the herd by themselves (e.g., due to the other participant leaving to collect and retrieve a new TA that appeared outside the containment region), that remaining participant would transition to produce circling behaviors around the entire herd [12].
During a post-experiment debriefing interview, dyads who transitioned to COC behavior attributed their discovery to a moment of cognitive insight-i.e., a sudden cognitive reorganization of how to approach the problem [19]. Anecdotally, some participants reported that adopting this solution to solving the task entailed learning to ignore the motions of the contained TAs and, instead, to simply focus on maintaining rhythmic oscillations with their partner. These reports are supported by the stable modes of entrained movements observed in COCdiscoverers [11,17], namely, participants became attracted to producing either in-phase (0˚) or anti-phase (180˚) coordinated oscillations with their partner. These two coordinative modes of relative phasing are consistent with those displayed across a wide range of bimanual (within-agent) and social (within-dyad) rhythmic coordination tasks [20][21][22][23][24]. Additionally, the discovery and use of COC behavior altered participant eye-movement dynamics, resulting in longer, sustained fixations during COC as opposed to S&R behavior [25]. Collectively, these findings imply that dyads discover a new strategy-namely, to produce coupled rhythmic movements with their partner.
Task-dynamic models of corralling evasive agents. The ability to retrieve and contain a set of objects necessitates satisfying task-defined control requirements, which can be formalized using a task-dynamic model [14][15][16]. Rooted in dynamic systems theory, complexity science and ecological psychology [26,27], the use of task dynamics and its related formulation, behavioral dynamics [16], provide a framework for relating intentional individual and collective behavior across animal-environment systems [14,28]. Task-dynamic models utilize simple mathematical functions or rules (e.g., differential equations) to capture how goaldirected actions by agents unfold at an abstract, low-dimensional level of description [15,16,29,30]. Task-/behavioral-dynamics modeling has previously been applied to human singleagent behaviors such as reaching [15], walking and object avoidance [31], as well as human multiagent activities that entail inter-agent coordination of limb movements [22,32], sorting and passing objects [33], and crowd motion [6].
To model the corralling behavior of human participants that has been observed in previous studies, the task's dynamics were specified with regard to the goal of minimizing the distance of the set of TAs to a containment region, C. For convenience, the task's dynamics can be defined using a polar coordinate task-space (r,θ), with (r C ,θ C ) denoting the center of C located at the polar origin. The current positions of each HA i (where i = 1, 2 for a human dyad) and their respective TA to pursue at time t, TA (t),i , can be defined in polar coordinates, respectively, as (r HA i ; y HA i ) and (r TA ðtÞ;i ; y TA ðtÞ;i ) (see Fig 2). The containment region can be defined in two ways-either as a fixed location on the game field ðx C 1 ; x C 2 Þ, or as a time-varying location centered on the TAs' mean position ðx Once a TA is selected by HA i at a given time t, HA i begins to retrieve that TA by moving to a task-space location that is slightly radially beyond the task-space location of the selected TA , ðr; yÞ ¼ ðr TA ðtÞ;i þ r min ; y TA ðtÞ;i Þ. The radial offset, r min (where r min >0), ensures the TA is repelled towards C.
The attraction of a given HA to the location of the TA can be captured using point attractor dynamics. This can be achieved using a damped-mass spring function for both the radial and angular components of the HA's movement, For Eq 1, r, _ r and € r represent the radial position, velocity and acceleration, respectively, of HA i with respect to C; r TA ðtÞi is the radial position of the TA that HA i is currently pursuing at time t; r min specifies HA i 's preferred radial distance from the selected TA; and b r and ε r are free parameters that vary the damping and stiffness, respectively, of the radial force attracting HA i from r HA i to ðr TA ðtÞi þ r min Þ. Eq 2 mirrors Eq 1 but controls the angular force attracting y HA i towards y TA ðtÞi .
When faced with multiple TAs to select from, a HA is hypothesized to implement the intuitive rule of targeting and moving towards the TA that is (i) farthest from C and (ii) moving away from C [11,12,17]. In the case of multiple HAs, each HA implements the rule of selecting the farthest TA that belongs to a subset of TAs closer to one's own position compared to their partner's. These TA selection rules can be formalized as the following, For Eq 3, the TA to pursue at time t, TA (t),i , is a member of the subset of TAs, H i , which are closer to HA i than their partner (see Method for more detail). From this subset, TA (t),i is the TA who will be farthest from C, where x TA ðtÞi and _ x TA ðtÞi represents the position and velocity vectors, respectively, of TA (t),i in relationship to C in Cartesian space. The value _ x TA ðtÞi Dt represents the positional increment added to x TA ðtÞi at time t to predict TA (t),i 's position at t+Δt. When incorporating a TA selection rule such as Eq 3, the task-dynamic model defined in Eqs 1 and 2 is sufficient to generate S&R behavior. However, the dynamics defined by Eqs 1 and 2 cannot produce either the coupled oscillatory containment (COC) behaviors adopted by human dyads [11,17] or the circling behaviors observed by individual human participants [12]. To account for COC behavior by dyads, the angular dynamics defined by Eq 2 can be modified to capture both the oscillatory nature of COC, as well as the stable patterns of interparticipant entrainment. This can be done by converting the damped mass-spring dynamics to limit-cycle oscillatory dynamics, and by adding an appropriate inter-HA coupling function, The inclusion of the nonlinear terms b y _ y 3 HA i þ g y y 2 HA i _ y HA i enables oscillatory, limit-cycle dynamics when b θ <0. The coupling term ð _ y HA i À _ y HA j ÞðA þ Bðy HA i À y HA j Þ 2 Þ; when |4B|>|A|, couples the angular dynamics of HA i to its partner HA j to produce the stable in-phase or anti- , that is both i) a member of the subset of TAs closer to itself than to its HA partner and ii) is predicted to be the farthest TA in this subset from the center of the containment region at time at time t+Δt. This subset can be visualized using the boundary � B � , defined as the perpendicular bisector of a line drawn between the HAs; the TAs falling on HA i 's side of � B � comprise the set of potential targets for HA i . Parameter r min represent an offset to ensure TAs are repelled towards C, and r Δ represent a decision boundary that determines the minimum distance of the selected TA that elicits herding behaviors by an HA. See text for more details. https://doi.org/10.1371/journal.pone.0260046.g002

PLOS ONE
Task dynamics define the contextual emergence of human corralling behaviors phase coordinative patterns observed in previous research [17] and in rhythmic coordination more generally [20,21,34]. A more detailed discussion of Eq 4 can be found elsewhere [17,35].
In the case of an individual HA producing circling movements to keep the TA herd contained, Eq 2 can be modified as follows, When b θ <0, an HA implementing Eq 5 will circle around the task-space origin at a rate of � ffi ffi ffiffi b y b y q rad � s À 1 with the direction specified by the HA's initial angular velocity [15]. Further, to accommodate the observation that participants will perform COC or circling behaviors at a fixed distance from the task-space origin, Eq 1 can also be modified to replace ðr TA ðtÞ;i þ r min Þ with r Δ , where r Δ represents an HA's preferred distance from C. Both Eqs 4 and 5 have been successfully implemented into the control architecture of an artificial HA capable of working alongside naïve participants to contain TAs to a singular location [17], as well as transporting between multiple locations and adapting to the introduction of new TAs during a trial [12]. Current study. The transition from S&R behavior to containment and circling behaviors is driven by a change in how HAs are coupled to the task environment. In a task-dynamic framework, these changes in behavior are induced by corresponding graph-dynamic [30] changes in the compositional structure of the system equations that underlie the observed behavior (Eqs 1 and 2 for S&R, and Eqs 4 and 5 for containment and circling behavior, respectively). The discovery of containment strategies by participants in previous research may reflect context-dependent interactions that facilitate the detection of these behavioral possibilities. In this way, Eqs 4 and 5 may exist as latent properties revealed by the movement patterns shaped by Eqs 1 and 2. Accordingly, when engaged in a particular corralling task context, human actors are hypothesized to exploit these latent properties and subsequently implement these dynamics intentionally [28,36].
Inspired by the similitude in the containment and encirclement strategies observed during human [11,12,17], as well as animal [7][8][9][10] and non-biological systems [13], the current study evaluated whether COC and circling behaviors can be understood more generally as invariant, emergent properties of human dyadic corralling behaviors. As opposed to limiting the behaviors of participants to hand movements on a tabletop display, participants were embodied in an immersive virtual reality environment in which they had to locomote across a large space to corral and contain the fleeing TA herd (see Fig 3). Model-based simulations were also conducted to determine whether the emergent behaviors that participants adopt in this current experiment, as well as in previous research [11,12,17], can be predicted as latent properties of the task-dynamic model.

Results
Human naïve participants, recruited as dyads, were tasked to keep a set of seven TAs (modeled as spheres with radius 0.24 m) contained within 0.72 m of the TA herd's mean position on a game field measuring 6 × 3.48 m (see Fig 3). Dyads were exposed to one of three difficulty conditions, which manipulated the maximum speed the TAs could move (� 0.12 m�s -1 , � 0.20 m�s -1 , � 0.28 m�s -1 ). Dyads were given 45 minutes to solve the task across a series of two-minute trials. During a trial, dyads had to prevent any TA from fleeing the field. Additionally, participants had to contain the TAs within the containment criteria for at least 70% of the last 45 seconds of the trial. Dyads who could meet these criteria on eight separate trials were deemed successful.
In total, 39 dyads participated in the experiment. These dyads were equally distributed across low-(� 0.12 m�s -1 ), medium-(� 0.20 m�s -1 ) and high-(� 0.28 m�s -1 ) TA maximal speed conditions. From the 39 dyads, 26 (66.67%) dyads met the experiment success criteria. Most failures resulted from dyads who were assigned to the high-speed condition (3 [23.08%] dyads reached eight successful trials), followed by the medium-speed condition (10 [76.92%] dyads), whereas all dyads in the low-speed condition succeeded. Within these successful dyads, the time required to complete the experiment was 16.91 (SD = 1.23) minutes, 24.83 (SD = 4.73) minutes, and 32.02 (SD = 4.73) minutes for the low-, medium-and high-speed condition, respectively. Overall, the range of task difficulty was sufficiently broad so that the low-speed condition was trivially easy, and the high-speed condition was very difficult.
Humans discovered coordinated circling as an effective corralling strategy that was sensitive to task difficulty Across conditions, successful dyads discovered that an effective means to complete the task was to perform circling movements around the TA herd. Representative examples of these movement patterns and how they varied across task difficulty conditions are shown in Fig 4. Across successful containment trials, dyads performed 5.99 (SD = 4.04), 7.62 (SD = 4.24), and 13.15 (SD = 4.81) cumulative 2π rotations in the low-, medium-, and high-speed difficulty conditions, F(2, 31) = 7.12, p = .003, Z 2 p = .32. The number of rotations completed by dyads in the high-speed condition was greater than those in the low-(p = .002) and medium-speed (p = .02) conditions, where pairwise comparisons for this and all other tests were Bonferroni-corrected. As shown in Fig 4, dyads in the lower speed conditions often exhibited intermittent cycling or cycling with frequent direction changes. This was in contrast with dyads in the high-speed condition who maintained a fixed rotation direction over the course of the entire trial. Despite these differences in locomotion cyclicity, participants remained coordinated in their behaviors by maintaining an 180˚(π radians) angle of separation with respect to the TA's mean position, t(33) = -0.31, p = .76, which did not differ between conditions, F(2, 31) = 0.99, p = .38, Z 2 p = .06. Additionally, dyad cycling direction preference was symmetrical-a particular dyad was equally likely to prefer a clockwise (CW) or counterclockwise (CCW) rotation around the TA herd (CCW rotation during 56.75% of successful trials, SD = 32.26), t(33) = 1.22, p = .23, d = 0.21). Interestingly, performance was worse on successful trials from dyads in the medium-speed condition as compared to those in the low-speed condition (p < .001). This decrease in performance may reflect an approach towards a critical point that destabilizes intermittent (CW/CCW) circling behavior as an effective solution to sufficiently control the TAs' movements. Although there was no difference in the number of rotations performed by dyads in the low-(M = 5.99 rotations, SD = 4.04) and medium-speed (M = 7.62, SD = 4.24) conditions (p > .99), dyads in the medium-speed condition did not restrict the TAs' distance travelled (M = 11.52 m, SD = 2.14) as much as dyads in the low-(M = 8.89 m, SD = 0.71) (p = .001) and high-speed (9.05 m, SD = 1.74) (p = .006) conditions, F(2, 31) = 9.96, p < .001, Z 2 p = .39. As the maximal speed of the TAs increased, a transition from intermittent to continuous cycling in a fixed direction appeared necessary to maintain control over the herd. Dyads in the high-speed condition managed to restrict the TAs' movements to the same degree as those in the low-speed condition (p > .99).

Model simulations revealed emergent dynamics that emulated the strategies humans adopt in different corralling task contexts
The cyclical behaviors human dyads produced while locomoting are different from the oscillatory behavior participants discover when using hand movements to control their respective HA (as opposed to the head's position used in the current experiment). A notable difference between the tabletop environment (Fig 1) and the environment employed here (Fig 3) is the effort required to traverse and change directions in the task fields for hand movements (field width = 1.17) [11,17] and locomotion (field width = 6 m). Model-based simulations were conducted to determine whether both circling and oscillatory behaviors could be understood as latent, emergent properties of the same task-dynamic model when appropriately parameterized to account for the differential constraints acting upon HAs in either context. In this way, the structure of the task-dynamic model reflects the constraints of the task while the model's parameterization reflects the physical constraints acting upon HAs to move about the task environment [37]. Specifically, the dynamics afforded by Eqs 1 and 2 can be constrained by the parameterization by the model's stiffness, ε, and damping, b, parameters, which relate to the conservative and dissipative forces acting upon the system, respectively.
Two artificial HAs completed the corralling task in the task environment used for the human experiment. The agents embodied the model defined by Eqs 1, 2 and the TA selection rule (Eq 3). Simulations were conducted across a range of stiffness and damping parameter values. The same parameter values were used for the radial and angular dynamics (i.e., ε = ε r = ε θ ; b = b r = b θ ). Plots summarizing the simulations are presented in Figs 5 and 6.
When underdamped (z<1), emergent oscillatory behavior can be observed in Fig 5 (bottom row). This oscillatory behavior was the result of a symmetry-breaking event once the TAs were contained, whereby the repulsions of the artificial HAs caused the TAs to move collectively in the same direction. This resulted in a stable pattern of behavior whereby the artificial HAs "oscillated" to negate the directed forces of the TA herd (see S4 Video). Note that these COClike movements were reactionary to the movements of the TAs and were not internally generated.
Although this emulated the behaviors observed when participants complete the corralling task using hand movements on a tabletop display [11,17], this behavior was not tenable when locomoting a large task environment. Specifically, when considering parameters that resulted in a mean peak frequency > 0.5 Hz, the cutoff criterion for COC behavior [11,17], the movements of the artificial HAs equated to approximately 2.67 m�s -1 (SD = 0.73), exceeding the transition threshold from walking to running in humans (~2.0 m�s -1 ) [38]. Given this behavior includes the reversals inherent in COC behavior, this speed is not maintainable for a substantial period (e.g., two-minutes).
However, as the movement speed of the artificial HAs decrease (i.e., due to overdamped dynamics, z>1), a different symmetry-breaking event occurred (Fig 5, top half). As opposed to exhibiting oscillatory-like behavior, the artificial HAs exhibited emergent circling behavior around the TA herd while maintaining an angle of separation of~180˚from each other. This behavior resulted from the artificial HAs' inability to move to the selected TAs fast enough to direct a repulsion force towards the TAs' mean position. Instead, an oblique force is applied, inducing a collective spin in the TAs (see S5 Video) which the HAs reactively follow. When considering parameters resulting in a mean peak frequency of � 0.5 Hz, the corresponding HA movement speed was approximately 0.91 m�s -1 (SD = 0.46), which reflects the preferred walking speeds of participants during circular turning-0.96 ± 0.1 m�s -1 [39]. Given these settings, the HAs on average produced 4.67 cumulative 2π rotations around the TA herd (SD = 1.93), as opposed to 0.91 cumulative 2π rotations (SD = 0.46) when the agents' parameterization resulted in a mean peak frequency of > 0.5 Hz. Note, rotations when mean peak frequency was > 0.5 Hz were due to transient behaviors by the artificial HAs at the beginning of the simulation. Once the TAs were contained, the artificial HAs stabilized on in-phase oscillations (see Fig 5, bottom left). (m = 1 kg). Parameters above the line indicate overdamping (z > 1), while those below indicate underdamping (z < 1). Artificial HAs embodying the task-dynamic model defined in Eqs 1-3 exhibited circling movements around the target agent (TA) herd when the model was overdamped (top left). When the model was underdamped, the two artificial HAs exhibited oscillatory behavior (bottom left) at a peak frequency consistent with previous work studying human collaborative problem-solving (bottom-right power spectrum) [11,17]. Angular displacements in the negative/positive direction indicate counterclockwise/clockwise motion. See also S4 and S5 Videos. https://doi.org/10.1371/journal.pone.0260046.g005

PLOS ONE
Task dynamics define the contextual emergence of human corralling behaviors

Model simulations reproduced the effect of task difficulty on human coordinated circling behavior
The role of task difficulty in the emergence of circling behavior exhibited by both human participants and artificial HAs becomes clearer when comparisons are made between human participant and artificial HA behavior, with model simulation parameters selected which result in movement performance in the range of the human dataset. As shown in Fig 6,   , and m was set equal to a constant value of 1 kg. Parameters above/below the line indicate a system that is over/under-damped. The color coding of data points is defined according to mean movement speed of the HAs (top row), number of cumulative 2π rotations by the HAs (middle row), and mean distance traveled (i.e., path length) by the target agents (TAs; bottom row). From left to right, figure columns represent simulations at three increasing levels of task difficulty according to correspondingly increasing levels of assigned TA speeds: Low-speed (� 0.12 m�s -1 ; left column), medium-speed (� 0.20 m�s -1 ; middle column), and high-speed (� 0.28 m�s -1 ; right column). Different combinations of stiffness and damping resulted in differences in the qualitative behaviors adopted by the artificial HAs (middle row), as well as in the HAs' ability to contain the TAs (bottom row). The gray areas in all figures designate the regions of b-ε parameter space that fall within the 95% confidence interval of human locomotion speeds in these difficulty condition (constructed using HA Mean Speed [top row]). See also S1 Fig. https://doi.org/10.1371/journal.pone.0260046.g006 95% CI) (the gray areas in Fig 6). When constrained to reflect the movement performance of human participants, the simulations reproduced the qualitative behaviors observed in the current experiment. Specifically, as task difficulty increased, the artificial HAs produced more cyclical behaviors in a fixed direction (low-speed M = 3.02 cumulative 2π rotations, SD = 1.02; medium-speed M = 3.93, SD = 1.52; high-speed M = 5.60, SD = 2.74). However, unlike the results from human participants, as task difficulty increased, so did the TAs' distance travelled (low-speed M = 6.54 m, SD = 2.25; medium-speed M = 8.47, SD = 3.79; high-speed M = 10.77, SD = 5.77), highlighting that the cyclical behaviors of the artificial HAs was in response to the fleeing behavior of the TAs.

Discussion
Novice participants, when tasked to corral seven evasive target agents (TAs) in a task environment which required locomotion, developed a coordinated circling strategy which kept the agents sufficiently contained. This circling behavior was distinct from the oscillatory (i.e., COC) behavior observed by dyads in previous research where participants completed the task using hand movements [11,17]. However, simulations demonstrated that both behaviors can be understood as emergent properties of the same underlying task dynamics, captured using the task-dynamic model detailed in Eqs 1, 2 and the TA selection rule (Eq 3). Further, this model can also reproduce individual circling behavior as witnessed in previous research (see S6 Video) [11,12].
The task-dynamic model presented here shares features with other bio-inspired modeling approaches. Strömbom et al. [40] modeled Australian sheepdog retrieval behavior when pursuing sheep that are farthest from the herd and driving the sheep to the herd's center of mass. Further, Muro et al. [7] demonstrated that various complex group hunting behaviors in wolves can be recreated by two simple rules followed by individual wolves-namely, to move directly towards the prey until a safe distance is reached, and then to move away from neighboring wolves during containment. There are notable differences, however, between the corralling task that human participants completed in the present study and what is observed in nature. For instance, the TAs corralled by participants in this study, as well as in previous work [11,12,17], exhibited dynamics different from what is observed in natural herding and hunting contexts. Unlike real animals who clump when threatened [41], the TAs in this study were not coupled to each other. Thus, participants in this study had to continually act to keep the herd contained by keeping the forces directed towards the herd's center of mass. In contrast, canids either make periodic movements to contain fleeing agents escaping from the flanks of the herd, as is the case in shepherding [40], or equally disperse themselves around a prey, as is the case in wolf-pack hunting [7]. However, when the escape capabilities of prey increase, more explicit encirclement behaviors are observed. For example, cetaceans like orcas [8] and humpback whales [9,10] exhibit carousel/bubble-net feeding whereby members of a pod will encircle and blow bubble ring structures to entrap fish. In non-animal systems where the artificial herders' task is to corral large numbers of artificial agents that do not have preferences to clump, explicit circling behaviors are also adopted [13]. Thus, due to the relative difficulty of the task, circling behaviors by human participants reflected a coordination strategy that maximized control to keep the TA herd contained.
The discovery of encirclement behaviors by participants may reflect interaction dynamics that scaffold their realization [42]. As demonstrated through simulation, artificial agents implementing the task-dynamic model defined in Eqs 1-3 unveiled oscillatory and circling behaviors as a latent, emergent property of the task context. Similarly, what may differentiate high performing from unsuccessful human dyads are differences in interactions which govern whether these latent dynamics are generated. Or, once generated, participants may vary in their ability to detect these latent possibilities to guide interpersonal coordination. Once detected, however, dyads can then learn to exploit this emergent property intentionally [28,36], removing the dynamic as a latent feature of the task, and instead producing these dynamics explicitly.
The discovery and utilization of explicit coordinated circling by participants in this experiment can be modeled by modifying Eq 5, which defines individual circling behavior, to include a repulsive reactive coupling term, À ðy HA i À y HA j Þ. This term serves to maximally separate HA i from its co-actor j while circling, resulting in an angular difference of 180˚, consistent with what was observed in the human experiment, Given the different corralling behaviors that can be modeled by Eqs 2 and 6, respectively, a step function can be used to switch between either equation, ( whereby Eq 2 is implemented if the pursued TA's radial position, r TA ðtÞ;i , exceeds the distance of the containment threshold, r Δ (see Fig 2), otherwise Eq 6 is implemented. Thus, in addition to discovering the latent dynamics defined by Eq 6, participants are also hypothesized to learn to detect contextual information which determines the appropriateness of circling behavior via a control law [16] (e.g., Eq 7). In skill learning, an individual develops coordination patterns, or synergies [43,44], which define how the many degrees-of-freedom of one's body should interact to allow for the control of movement [43]. Once these synergies are formed, they must be appropriately coupled to environmental information that specifies the control requirements for a given task, which are low-dimensional in comparison. This interactive coupling between agents and their environment can be formalized using the task-/behavioral-dynamic modeling framework discussed in this paper [45]. The challenge when moving from individual to collective and collaborative task contexts is to understand how groups divide a task between members. An answer to this problem requires both an understanding of what information agents use to make individual decisions in group contexts [46,47], as well as how these decisions are communicated to others agents through coupling [48][49][50][51][52]. As with individual behavior, human groups can form interpersonal synergies whereby individuals in a collective will provide compensatory support along task-relevant dimensions to help achieve a joint goal [53,54]. This ability to coordinate is maximal when agents are similar in their movement kinematics or complexity [55][56][57].
However, the identification of the task-relevant components that need to be controlled may not always be transparent. Recent advancements in machine learning approaches may provide an opportunity, in conjunction with theoretical modeling, to uncover the necessary control laws and coordination couplings [58]. Agents trained using deep reinforcement learning (DRL; i.e., the integration of reinforcement learning with deep neural networks), for example, have been successful in discovering adaptive behavior and strategies in individual [59] and group task contexts [60,61]. Within the context of working with humans in collaborative tasks, such agents can develop control policies that are either user-specific [62] or generalize to a distribution of human strategies during training [63]. By giving meaning to actions with the use of reward functions [64], black-box self-supervised approaches have the ability to provide a "direct fit" [65] between an agent and task-relevant states-assuming there is sufficient sampling of the task environment. Indeed, the reason why deep neural networks may be so successful in certain task domains may be because of their ability to detect the low-dimensional structure of the world [66]. As demonstrated with the recent successes in DRL, deep neural network architectures can not only detect invariances in physical properties such as in image detection [67], but may also be able to detect the low-dimensional structure of animalrelevant properties (i.e., affordances [26]), which can be formalized using task-dynamic models, in constraining animal, including human, behavior.

Participants
Eighty undergraduate students from the University of Cincinnati (M age = 18.93 years, SD = 0.90), recruited as dyads, participated in the experiment. One dyad was later removed from analysis due to a premature software closure. Participants received research credit towards completion of a Psychology course requirement. The study was approved by the University of Cincinnati's Institutional Review Board.

Materials and task
The corralling task was designed as an immersive virtual reality experience where dyads could locomote in a shared physical and virtual space to retrieve and contain evasive target agents (TAs) see Fig 3). The task software was developed using Unity (ver. 5.6.12, Unity Technologies, San Francisco, USA) and a wireless local area network (WLAN) was utilized to synchronize task and participant states using Unity's UNET server-authoritative networking protocol. Participants wore backpacks containing portable computers (MSI VR-One, Micro-Star International, Taiwan) which were equipped with an HTC Vive virtual reality headset (HTC Inc., Taiwan). This setup allowed for the free movement of participants within an 8 × 6 m space. Within the virtual environment, participants were embodied as floating avatars (avatar shoulder width = 0.42 m), calibrated to each participant's height. The avatars were controlled by the participants' head movement and an inverse kinematics controller (FinalIK, Rootmotion, Estonia) was responsible for rotations of the avatar's torso. Task and participant states were recorded at 50 Hz.
The virtual environment contained a grass field (6 × 3.48 m), seven TAs, and a herding agent (HA) for each participant. The TAs were modeled as spheres (radius = 0.24 m, mass = 2 kg) and their native behavior was governed by Brownian motion. The TAs could leave the task space by falling over the edge of the grass field. Each participant's HA was represented as a blue or orange colored cube (edge length = 0.15 m) with the cubes' positions (x, z) set to their respective participant's head position at each timestep. Interactions with the TAs by participants were done through their HA cubes. Whenever a participant's HA was within 0.6 m of a TA, the Brownian force acting upon the TA was replaced by a repulsive force directed away from the participant's HA (the dynamics of which is consistent with the method reported in [11]). The maximum speed the TAs could move was clamped to not exceed the value set by the task's condition (� 0.12, � 0.20 or � 0.28 m�s -1 ). This repulsion force was the only means of interacting with the TAs. Collisions between HAs and TAs were not possible, although TAs were able to collide with each other.
Before each trial, participants moved their respective HA to a start location (x, z) = (0, ±1.2 m) on their respective side to jointly initiate a trial. Once initiated, seven TAs appeared centrally clustered on the grass field between both participants (cluster center [x, z] = [0, 0]; cluster radius = 0.36 m). The aim of the task was to contain all seven TAs within 0.72 m of the TA's mean position (i.e., the TAs' centroid), calculated at each timestep. When all TAs were within this threshold, the color of the TAs turned red (otherwise they remained a white color). Each trial was 2 minutes in length, and a trial was completed successfully if all TAs were sufficiently contained for at least 70% of the last 90 s of the trial (i.e., for at least 63 seconds). If the 70% criterion was not met, or if a trial ended prematurely due to a TA falling over the edge of the grass field, the trial was considered a failure. At the end of each trial, participants received visual feedback regarding their performance. If the entire 120 s trial duration elapsed, the feedback was the percentage of time the TAs were contained during the last 90 s of a trial. If the trial ended prematurely, the visual feedback was a message that read "Try Again!".

Design and measures
The experiment with human participants implemented a between-subject design. Dyads were randomly assigned to one of the three maximal TA speed conditions (low-speed = 0.12 m�s -1 ; medium-speed = 0.20 m�s -1 , high-speed = 0.28 m�s -1 ). Task performance as well as human participant and simulated agent behaviors were assessed for successful trials. Measures were calculated for the entire trial duration (120 s). The measures used to assess task performance were TA containment time (s)-the total time the TAs were within the containment criteria-and TA distance travelled (m)-the mean TA path length (cumulative sum of the displacements).
The following measures were used to assess both participant and simulated (see Model simulations) HA behavior. First, the degree to which HAs performed cyclical movements around the TA herd was computed by taking the cumulative sum of the angular change of their movement with respect to the TA herd's mean position, divided by 2π (referred to as cumulative 2π rotations). The absolute value of the result was taken to account for preferences for clockwise/ counterclockwise motion. Relatedly, dyad cycling direction was the sign of cumulative 2π rotations prior to taking the absolute value. A negative/positive value indicated a preference for clockwise/counterclockwise motion. Next, to quantify the coordination between HAs, the angle of separation was computed as the mean angle between x HA 1 to x HA 2 in the counterclockwise direction (i.e., from 0 to 2π), where x is the position vector with respect to the TA herd's mean position. Finally, HA movement speed (m�s -1 ) was defined by calculating the mean cumulative sum of the displacements by both HAs and dividing the result by the trial duration.
For the simulations conducted and presented in Fig 6, the degree of oscillatory behavior exhibited by artificial HAs was also explored. The mean peak frequency (Hz) was computed by constructing a frequency power spectrum of each HA's angular position time series using MATLAB R2020's pwelch function. The mean peak frequency was defined by the frequency with the most power between 0.2 to 2 Hz and averaged across the artificial HAs. A Hamming window of 1024 samples with a 50% window overlap was used in constructing the frequency power spectrum.

Procedure
Following informed consent, participants were taken to the testing room and received task instructions. Dyads were told that they had 45 minutes to complete eight successful trials of the corralling task. The experimented ended either when 45 minutes elapsed, or when eight successful trials were completed (whichever came first). Although trial success was determined during the last 90 s, participants were told to keep the TAs (referred to as "sheep" to participants) sufficiently contained for as long as possible. The experimenter clarified to participants that there was no pre-defined location for where the TAs should be contained-only that the TAs should be kept together somewhere on the grass field. Additionally, participants were informed that they could not communicate during the task, or during any breaks. The experimenter in the room enforced this no-talking policy.

Model simulations
Simulations using the same task environment as presented to human participants (excluding the avatars) were conducted with two artificially controlled HAs implementing the taskdynamic model presented in Eqs 1 and 2 using the TA selection rule defined by Eq 3 in Target agent selection below. The simulations were conducted using Unity (ver. 2017.4.40f, Unity Technologies, San Francisco, USA). The behaviors of the artificial HAs were manipulated by setting the stiffness, ε, and damping, b (via the damping ratio, z), parameters for both Eqs 1 and 2. Relative to a given ε, a system can be described as under-/over-damped via the damping ratio, z, which is the ratio between the value of b and its critical value, which is equal to 2 ffi ffi ffi ffi ffi ffiffi mε p (where m = 1 kg). When z<1, the system will intersect the attractor in less time but will overshoot. When z>1, the system will approach the attractor slowly. When z = 1, the system will approach the attractor with the least amount of time that results in at most one overshoot [68]. By varying z to define b, it is possible to assess the effect of underdamped or overdamped dynamics on the resultant patterns of behavior during the corralling task as a function of ε. In addition to setting stiffness and damping, the minimal radial distance to a selected TA, r min , in Eq 1 was also varied. A summary of the parameters that were considered is presented in Table 1. At each timestep, each artificial HA selected a TA to pursue using the selection rule defined in Eq 3 (see Target agent selection below, Δt = 1). The artificial HAs completed five trials with each parameter combination, and trials were 120 s in duration. The dynamics governing the HA and TAs behaviors, including the TA selection rule, were updated at 50 Hz. The simulation speed was set to 100 times faster than real time.
Target agent selection. In a corralling task with multiple TAs (TA k ; where k = 1, 2, . . ., M), HA i is hypothesized to implement the intuitive rule of selecting and moving towards the TA that is (i) farthest from the task goal [11,12,17] and is (ii) moving away from the goal. These rules can be expressed as follows: where, and x TA ðtÞi and _ x TA ðtÞi represents the position and velocity vectors, respectively, of TA (t),i in relationship to C in Cartesian space. The value _ x TA ðtÞi Dt represents the positional increment added to x TA ðtÞi at time t to predict TA (t),i 's position at t+Δt. To summarize, an HA is hypothesized to select the TA at time t that will be farthest from C at time t+Δt.
In the presence of multiple HAs, HA i only considers TAs that are closer to themselves than to their co-actors. For two HAs, this can be understood as both HAs (i and co-actor j) creating a boundary, � B, that is the perpendicular bisector of the line between them, � x HA i x HA j , at their mean position � x HA (see Fig 2). where Eq 12 is the same as Eq 3. To summarize, in task contexts with multiple HAs and TAs, the TA selection rules described above ensure that each HA i will pursue the TA in its assigned subset that is both nearest to HA i at time t and predicted to be the farthest from C at time t +Δt. Although only two HAs were considered in the simulations presented here, these selection rules can be extended to larger HA groups.