Nash Equilibria in Multi-Agent Motor Interactions

Social interactions in classic cognitive games like the ultimatum game or the prisoner's dilemma typically lead to Nash equilibria when multiple competitive decision makers with perfect knowledge select optimal strategies. However, in evolutionary game theory it has been shown that Nash equilibria can also arise as attractors in dynamical systems that can describe, for example, the population dynamics of microorganisms. Similar to such evolutionary dynamics, we find that Nash equilibria arise naturally in motor interactions in which players vie for control and try to minimize effort. When confronted with sensorimotor interaction tasks that correspond to the classical prisoner's dilemma and the rope-pulling game, two-player motor interactions led predominantly to Nash solutions. In contrast, when a single player took both roles, playing the sensorimotor game bimanually, cooperative solutions were found. Our methodology opens up a new avenue for the study of human motor interactions within a game theoretic framework, suggesting that the coupling of motor systems can lead to game theoretic solutions.


Experimental & Theoretical Setup
The most general theoretical framework for multi-agent motor interactions is dynamic game theory [1] in case of competing players and optimal control theory [4] in case of cooperating players. In our experiments the condition of cooperation and competition was translated into a one-player versus a two-player condition. In the one-player condition the two robot handles were controlled by the two arms of one person. In the two-player condition the two robot handles were controlled by two different players. In the one-player condition bimanual coordination can be understood by a player minimizing a cost functions that takes into account both task requirements and energy consumption (see [3] for reference). In the two-player condition both players try to minimize their cost respective functions but the global minimum must not be a stable solution. Dynamic game theory allows predicting the stable Nash solution in which players minimize their maximum loss.
In both optimal control theory and dynamic game theory analytic solutions to multi-agent control problems can be found if the problems can be formulated as linear dynamical systems. As described in the main text we conducted two experiments: • The prisoner's dilemma motor game. Our version of the prisoner's dilemma defines a nonlinear dynamical system, because the force that is experienced by a player, depends on terms that are bilinear in the arm position of that player. This can be seen as follows: The force f depends both on the spring constant K and the y-position, such that f = Ky. The spring constant, however, depends on the x-positions of the two players such that K(x 1 , x 2 ) = αx 1 +βx 2 . Thus, the force depends on terms bilinear in yx 1 and yx 2 . To analyze the result of this experiment we therefore discretized the x 1 x 2 -space and performed a traditional discrete matrix game analysis (see [1] for reference).
• The rope-pulling game. This game can be expressed as a linear dynamical system. Each one of two players controls a robot handle, with position x H1 t and x H2 t in the horizontal plane respectively, which together control a virtual mass point such that where x C t is the mass point position at time t and D φ 1 and D φ 2 are visuomotor rotation matrices and α is a scaling parameter. The scaling parameter was set to α = 2 throughout our experiment to confine arm movements to a small space in order to avoid collision of the robot handles. Additionally, an isotropic spring was simulated on each robot handle to provide proprioceptive feedback of the distance moved. Since the spring was isotropic we do not need to model it explicitly and can use the moved distance as a proxy. In the experiment the visuomotor rotation matrices D φ i were drawn randomly every 80 trials from the set {−135 • , −90 • , −45 • , 0 • , +45 • }, but in the model we assume that the rotations are known to the player, which corresponds to a situation after learning has taken place.
In the following we present the analytical solution to the rope-pulling game.
The optimal solution for the bimanual condition can be computed based on optimal feedback control for Linear-Quadratic-Gaussian (LQG) systems [4]. LQG models deal with linear dynamic systems, quadratic cost functions as performance criteria, and Gaussian random variables as noise. Here we consider the following model with the variables dynamic state The random variable ξ t ∈ n is a realization of an independent, zero-mean, Gaussian noise process The system matrices F and G define the dynamics of the system. To simplify the analysis we consider an infinite horizon cost function Time is discretized in bins of 10ms. The model predictions can be seen in Figure S1.
Arm Model. Following previous studies [5,6] the hands are modeled as point masses m with two-dimensional positions p H i (t) and velocities v H i (t) =ṗ H i (t) with i = 1, 2. The two hands are designated by H 1 and H 2 respectively. The combined action of all muscles on the hands is represented by the force vectors f H i (t). The neural control signals u H i (t) are transformed to these forces through second-order muscle-like low-pass filters with time constants τ 1 and τ 2 . In every instant of time, the two hand motions are mapped to the virtual cursor motion by equation (1). Thus, the virtual cursor position p(t) is related to the hand positions by . Put together, this yields the following system equations Equation (5) can be written equivalently as a pair of coupled first-order filters with outputs g and f . This allows to formulate the state space vector x(t) ∈ 16 as where the target location is absorbed in the state vector. When discretizing the above equations with time bin ∆ the following system matrices are obtained In our simulations we set m = 1kg and τ 1 = τ 2 = 0.04s.
Cost Matrices. The cost matrices Q and R are defined as follows: Importantly, the Q-matrix punishes deviations from the x-and y-components of the virtual target equally, i.e. both components must be optimized as required for global optimality. In our simulations we set w p = 1, w v = 0.02 and w e = 10 −5 .
Optimal Policy. The optimal control policy is given by with The matrix S can be easily computed by solving the algebraic Riccati equation The optimal policy u t is 4-dimensional and comprises the optimal control policies for both hands

The Two-Player Condition
Nash strategies can be defined for linear quadratic games [1]. A linear quadratic game is characterized by linear system equations and quadratic cost functions for all players. In the following we assume with the variables dynamic state x t ∈ n control signal player 1 u P 1 t ∈ m control signal player 2 u P 2 t ∈ m The system matrices F , G 1 and G 2 define the dynamics of the system. In this two-player game there are two cost functions to consider with expected cumulative cost player 1 J 1 ∈ * + expected cumulative cost player 2 J 2 ∈ * + state cost matrix player 1 In the following we assume R 1 = R 2 , i.e. no differences in the penalization of efforts between the two players. The model predictions can be seen in Figure S2.
Arm Model. The system dynamics are identical to the system dynamics in the bimanual condition. Note the equivalence between G = [G 1 G 2 ].
Cost Matrices. The cost matrices for the two players are: Importantly, in the game each player only optimizes his component of the projected virtual cursor independently. This is reflected in the Q-matrices. Note that there is a relation between the Qmatrix in the bimanual condition and the Q i -matrices in the game condition, namely Q = Q 1 + Q 2 . The parameters w p , w v and w e were set equally in both conditions. Nash Policy. The Nash equilibrium policies can be computed to be with The matrices S i can be computed by solving the coupled algebraic Riccati equations  Figure S1. Predictions for cursor and hand trajectories in our version of the rope pulling game for cooperating players. In this example the rotation matrices for each hand have been set to the identity matrix, the gain has been set to α = 2. The hand trajectories are shown slightly displaced to allow visibility. The optimal feedback control model allows predicting the direction of the hand movements for arbitrary settings of the rotation matrices.  Figure S2. Predictions for cursor and hand trajectories in our version of the rope pulling game for competitive players. In this example the rotation matrices for each player have been set to the identity matrix, the gain has been set to α = 2. The dynamic game control model allows predicting the direction of the hand movements of each player for arbitrary settings of the rotation matrices.