An Information Theoretic Measure for Robot Expressivity

This paper presents a principled way to think about articulated movement for artificial agents and a measurement of platforms that produce such movement. In particular, in human-facing scenarios, the shape evolution of robotic platforms will become essential in creating systems that integrate and communicate with human counterparts. This paper provides a tool to measure the expressive capacity or expressivity of articulated platforms. To do this, it points to the synergistic relationship between computation and mechanization. Importantly, this way of thinking gives an information theoretic basis for measuring and comparing robots of increasing complexity and capability. The paper will provide concrete examples of this measure in application to current robotic platforms. It will also provide a comparison between the computational and mechanical capabilities of robotic platforms and analyze order-of-magnitude trends over the last 15 years. Implications for future work made by the paper are to provide a method by which to quantify movement imitation, outline a way of thinking about designing expressive robotic systems, and contextualize the capabilities of current robotic systems.


I. INTRODUCTION
As robotic systems move outside of a factory and into human work places, public spaces, homes, and even bodies, the pattern of movement which each platform produces will become essential to understand and to design intentionally. Further, as we aim to build and integrate multi-purpose robots that can adapt to many tasks and many scenarios, an understanding of how much a single platform can do -relative to another -will become important. Further, in both of these spaces, a notion of human movement imitation is enticing, and thus, how success of such imitation is measured is important.
This paper offers an approach to thinking about robotic platforms that can aid in each of these domains. In particular, the idea of expressive movement may guide how we measure the efficacy of robots in human-facing scenarios. This is often described as the different manners a robot should move if it is in a care-giving setting versus an authoritarian one like a modulation of a superficial or decorative style that is independent of practical task.
Robots that enter private spaces such as hospital rooms and homes should modulate their behavior in order to bring comfort and respect to human counterparts in such environments (just as humans modulate their own movement). On the other hand, a robot guiding humans out of a burning building or directing traffic should use clear, aggressive movements to impart the gravity of an emergency situation and ensure that each motion conveys a clear command. Thus, the ability to move expressively increases the function of such platforms. In these cases, we imagine that information passes to the human counterpart based on the motions of the robotic platform.
However, even in the factory setting, modulations can improve performance. Take for example a robot that may need to modulate the motions it uses to apply paint to a surface in order to compensate for the thickness of the paint on a given day: sometimes a 'flicking' quality will be more effective than a 'dabbing' one. Thus, to distinguish between functional and expressive movements is a bit of a matter of perspective.
In either such cases (functional or expressive), we often attempt to imitate the behavior of humans. Indeed, it seems that in natural human settings the movement of humans encodes information. To generalize the examples given here, this information may deal with environmental state ('is the building burning?'), task state ('how thick is the paint?'), or emotional state ('are you in a hurry?' and even 'are you upset?'). Thus, we can think of an expressive robotic motion as an information source and the process of interpreting as a noisy channel to a receiver. This paper will begin to formalize such a setup.
Imitating the movement of biological organisms has been a topic in animation [33] and robotics [4,27,22,29,2,18], which is often dependent on the parameterization of the creature's movement. In [40,27,9,19,20,14] motion capture data is used to seed artificial representations. In several of these cases, continuous, trajectory-based measurements for success have been posed. The review in [4] discusses "robots that imitate humans" saying "there are many ways in which a robot can be made to replicate the movement of a human" and describing "very high fidelity playback." Instead, this paper takes the approach of measuring the complexity of such movement, via a discrete, information theoretic approach as [32,7] did for workspace sensing tasks. In this case, we look to the dimensionality (rather than the trajectory recreation of an individual degree of freedom) of the data used to represent motion. A review of datasets of human motion indicates this number is on the order of tens or hundreds of parameters [37]. Even in a new sensorrich environment [15], models of a similar dimension are extracted. In [34,6] the notion of "low-dimensional" signals are introduced by using sparse (or, even sparser) marker sets; these representations are termed "physically meaningful." When successful, a proper parameterization of movement reveals much about the biology of the animal as in [12,13,11]. In this work, the movement of a C. Elegan was analyzed using a curve parameterized by 100 angles. Then, after capturing many hours of behavior, a principle component analysis was performed, revealing that the structure of the behavior could be explained as a superposition of four primary poses [12]. Observing the animal through this lens provided new insights into the behavior of this well-studied animal [13,11]. A C. Elegan is a tiny worm with only 302 neurons governing its behavior. The chosen 100 dimensional representation is then on the same order of magnitude as the number of the organism's neurons. This parameterization is also on the same order of magnitude as the motion capture data of humans analyzed for robotics cited here. However, given that there are thought to be 86 billion neurons in the human brain [3] (or certainly many more than in a C. Elegan), it is likely that many, many more degrees of freedom are needed to represent -or imitate -human movement. This motivates the need for another perspective with which to compare movement across platforms.
Thus, this paper presents an information theoretic measure of expressivity 1 for robots in Section II. This measure is applied to several distinct robots in Section III. Trends in the capacity for robots to exhibit expressive motion are analyzed using the measure in Section IV; the measure reveals a possible gap between computation and mechanization in modern machines. Discussion of a dynamic extension of the measure is provided in Section V. The paper will offer concluding remarks in Section VI.

II. A MEASURE FOR EXPRESSIVITY
A formalism for the concept of expressivity is provided here. This definition is inspired from comparison of robots to Turing's notion of computation. As has been motivated in the prior background section, the feature that very much differentiates verbose movement "vocabularies" in moving platforms (machines and animals) is the number of degrees of freedom needed to represent the movement and, thus, available to create complex movement.
Thus, we propose to track all possible configurations of the platform's degrees of freedom. That is, we'll measure the size of the number of shapes that can be achieved kinematically, which is equivalent to the precision with which a number could be recorded on the platform (as transistors are used in computers). We'll call this the kinematic mechanization capacity. We contend that this can be seen as a fundamental limit on the expressivity available to a robot.
To model this is simply a matter of capturing the representation with a number. In analogy, computers use the unit of bits to do this; more complex computational calculations require processors that can hold more bits, and thus use more transistors to do so. Or, an 8-bit display is more expressive than a 1-bit display. Similarly, robots can be viewed as needing more mechanical configurations in order to complete more complex mechanizations. A new mechanical configuration is created by a degree of freedom with more range of motion, more precision in its motion, or a new degree of freedom; any of these increases the expressivity of a platform.
To formalize this idea, let N be the number of actuator types on a machine. On computers with solid state hard drives, or other homogeneous machines, this number is 1 since these machines are made up of many, many transistors. For robots, we may often have most of the machine comprised of servo motors; however we may also have heterogeneity. On a robot with a simple gripper (which is either open or closed) and two identical servos, N = 2.
Let M i be the number of degrees of freedom with a particular number of available configurations R i , which is computed via counting from an actuators minimum to maximum range via its resolution where i = 1, ..., N . For a robot comprised of two servos, say with 360 o range and 0.1 o resolution with a gripper that may be 'open' or 'closed', R 1 = 3600 with M 1 = 2 and R 2 = 2 with M 2 = 1.
From this description of a machine's construction, let be the number of geometric or kinematic configurations (shapes) available to a platform. For the simple robot with an open-close gripper and two servos this becomes Another computation can compare this raw number of configurations to binary displays and computer architecture. From there the kinematic mechanization capacity of that platform is Thus, the robot in the previous example can express a shape containing ≈ 25 bits of information in its environment. Then, just as a 25-bit display can represent ≈ 2.6 × 10 7 numbers, this simplistic robot can be compared to a 25-bit display.
In this section we present some toy examples: • For a processor with 200 transistors: N = 1, M 1 = 200, and R 1 = 2. The quantity K is given by: • For a manipulator with ten servos with a 0.1 o resolution and 360 o range of motion: N = 1, M 1 = 2, and R 1 = 3600. The quantity K is given by: • For the same manipulator with a gripper which can be open or closed: The quantity K is the same (≈ 118 bits): III. APPLIED EXAMPLES In this section, the measure introduced in the previous section will be applied to some instructive examples. In particular, we will compare a common humanoid robot and a machine which attracts many interested human viewers, Vegas's Bellagio fountains.

A. Aldebaran NAO Humanoid Robot
The Aldebaran NAO robot is a captivating machine. Indeed, advances in feedback control and robotics were needed to build it. Appropriately, there is something impressive about seeing it move. The computation provided in this section, however, may call into question how useful it can be for replicating human behavior in any context. Figure 1 and Table I outline the basic capabilities of the platform where the sensor resolution (an encoder with 0.1 o precision) has been used to determine R i . Fig. 1. A diagram which lists the degrees of freedom on an Aldebaran NAO robot. In addition to these mechanical degrees of freedom the platform contains an ATOM Z530 onboard computer processor, which has 47 million transistors on board. [ Thus, the kinematic mechanization capacity is calculated as This calculation includes physical combinations which are kinematically or dynamically infeasible. However, changes in motor speed between configurations could also increase the complexity perceived (see Section V), pointing out missed states. Thus, the number may be seen as an approximation.

B. Bellagio Water Fountains
Consider a tourist attraction, like the Bellagio water fountains in Las Vegas, NV. Tourists line up every hour to watch this famous display, routinely included in lists of popular Vegas attractions. This is to say that the fountain display is visually very interesting, or expressive, for human watchers. Let's compare how much more complex it is than typical robots and consider how much less complex it is than most computers via the proposed measure.
The fountain has about 1,200 water cannons with 5,000 lights as part of its display. It also has the ability to create fog and features popular or famous music during the shows. For this analysis [8,16] Table II articulates a model for this system. For the Oarsman, which rotate about two axes, we assume a range of motion of 160 o with a resolution of 1 o in each dimension. We assume the water shooting out of the cannon to be on or off with a single pressure setting. Likewise, the Shooters, are either on or off without articulation. The lights can be 'off' or one of twelve colors (as modeled by a moderate segmentation of the color wheel). We ignore the music that plays alongside.
Thus, to compute the kinematic mechanization capacity, we find the following computation. K = log 2 (2 1175+208 × 160 208+208 × 13 6200 ) (4) = log 2 (4.9 × 10 8239 configurations) ≈ 27, 372 bits We could argue over which is more interesting to watch: a NAO or the Bellagio fountains, but this metric provides a quantitative bound on how much more expressive the fountains are. In this case, about two orders of magnitude with respect to the amount of information they can encode. This might strike roboticists as odd, but in terms of system expense and tourist attendance, the measure is consistent.
What if all the water cannons were the articulated, Oarsman variety? In that case, the computation becomes: Thus here, we can see how adding water cannons and articulation resolution increases the expressivity of the platform, but we do not capture the additional expressivity that the dynamics of timing and water add to (and possibly take away from) the system. For example, by moving with a certain timing, these fountains create different water displays, which add to the system's expressivity. On the other hand, in the presence of water, moving with a particular force, not all points in the cannon's range might be physically feasible.

IV. A COMPARISON BETWEEN MODERN COMPUTERS AND MODERN ROBOTS
This same measure has also been, previously, applied to computers. A simple observation about the rate at which silicon chips were doubling their transistor count, dubbed Moore's Law, has been an important benchmark for computational capacity for the computer processor industry [35]. The premise of the importance of this observation is that more transistors offer more precision in number representation for a single computation. Specifically, adding an additional transistor adds a bit of capacity. (Indeed, this is the origin of the unit used in the previous sections to measure robot expressive capacity.) Robots typically have computers on board. Increasing the computational power of such devices adds to the complexity of internal models for decision making. In this section, we'll compare robotic and computational hardware -contained on the same platforms -in order to point out order of magnitude trends between the computational and mechanical capacity of robots over the last decade and a half.
First, let us note that this distinction is a bit arbitrary. Indeed, computer processors move. Due to the choice of transistors as effective computational elements, we don't see that movement expressed -it's hidden within the electrons of tiny chips. But, for many years computers were made out of mechanical elements. We might, thus, see robots as a return to that original trend in computers. In other words, robots are computers. Thus, in this section, we'll separate degrees of freedom dedicated to computation (transistors) from those dedicated to mechanization (motors), but future machines might meld the two, graying the distinction.
Finding details of many important robotic platforms over the last fifteen years has not been possible. In order to provide a somewhat large sample size, the range of motion and precision of the actuators for most of the robots presented in this section have been estimated based on viewing motion of the machines and supplemented, where possible, with information from manufacturers and developers. For most platforms, we assume 0.1 o actuator precision and estimate the range of motion from watching videos of the platforms' movement. The computation presented in Section III-A is a representative example of those discussed here. The full list of values used in this section is available in the Appendix. The plot in Figure 2 shows an analogous plot to those that revealed Moore's law, where the processors listed are housed in selected robotic platforms. Plots like this have been used to track the progress of computational power over time, which has roughly doubled every year, even serving as a driving goal for the industry. In Moore's plot, each additional component on an integrated circuit represents the ability to represent a larger -or more precise -number on a single chip and thus more precision with which to compute.
Notably, every modern computer can perform many of the same operations, but this plot shows an increase in computational power, even onboard robotic platforms. This representation occurs within the mechanism of transistors. Each new transistor adds a new power of 2 in representation precision. Note, that the the number of transistors in modern, standalone processors is in the billions. To convert that to possible machine configurations, where actuators are transistors, the number 2 (which is the number of configurations for each actuator) has to be raised to that large number, resulting in a number of configurations that is on the order of 2 10 11 or 10 30102999566 . Figure 3 illuminates how the expressivity measure introduced in Section II has evolved on robots over time. Specifically, we plot number of kinematic configurations over time. Like Moore's proxy of the number of transistors within a given CPU, this kinematic configuration space is not perfect -dynamics will get in the way, as in a PC, if clock times aren't aligned and programming is inefficient, we can get better performance out of lower capacity machines -but it gives a starting point for comparison. In parallel, by converting the number of configurations to a number represented in a base 2 number system, the rise in the computational capacities of these platforms can be compared to their mechanization capacities as in Figure 4. This log-log plot provides a comparison in terms of the number of bits which it would take to describe the largest number that would fit in the onboard CPU versus the number of bits needed to represent each pose. The plot shows a dramatic imbalance between computation and mechanization capacities.
Consider, the NAO Aldebaran robot discussed in Section III-A. It's kinematic configuration capacity is comparable to a 1960s computer chip with only 256 transistors. The calculation in Section III-B puts the famous fountain display on order of complexity of microprocessors made around 1980. For example, the Intel 8086, which had 29,000 transistors.
These initial plots are far from complete. More analytical tools can supplement this analysis (see next section) as well as user studies for validation on how the biology of humans reacts to platforms. However, the trends point to an interesting phenomenon, which has, anecdotally, surprised many roboticists. Indeed, a prominent roboticist initially argued that an iPhone 6 has fewer available static configurations than the Baxter robot (something this analysis should put to rest). Not only does the iPhone 6 have more configurations available, it has many, many orders of magnitude more. Further, this imbalance has remained flat over the past fifteen years. This is an essential piece of information in the discussion [38,24,31], which is by now mainstream [23,28,26], around policy to support a rise in automation and the effects of job loss. The next section offers to develop a method to incorporate dynamic configuration strategies, which can both limit, in the case of physically infeasible poses, and expand, in the case of force sensing and force-controlled robots [30,5] the expressive capabilities of a particular platform.

V. TOWARD DYNAMIC EXPRESSIVE CAPACITY
Firstly, note that we have over-approximated the shape space by ignoring dynamic effects and assuming that all shapes are dynamically feasible. In this paper, every combination of actuator range of motion has been considered. In real systems, however, not all of these shapes are kinematically or dynamically feasible. For example, two robot arms may collide in certain cases of joint articulation, and this is a kinematically infeasible shape. Additionally, some shapes may result in physical instabilities and cause the robot to fall over due to the effects of gravity; this is a dynamically infeasible shape. For the order of magnitude analysis done here, it is unlikely that these infeasibiliities would greatly alter the results.
Further, two main sources of expressivity have been ignored here: additional configurations available to the source (to continue the analogy to communications) due to dynamic effects and inferences made over watching the source over time. This section will spend some time discussing the former. To the later, future work may investigate topics such as "message size" and "information rate" and "channel capacity" in the context of expressive robots based on the perspective presented here. For example, it is likely that a good expressive robot will have some redundancy built into the system as humans are likely not perfect receivers (which may be modeled as a noisy channel).
On the topic of additional configurations instigated by dynamics, we posit that the perceptual capabilities of humans need to be measured. It's not clear how sensitive to velocity or force differentials humans are. Indeed, in humans, dynamics are often coupled with explicit kinematic changes. In particular, change in muscle tonus is a major source of interpretation of movement expression (and inference of inner state). For example, a person sitting with a clenched jaw, visible through a bulge in their cheek, implies a different inner state than one without. This expression is more viably modeled through the shape deformation of the person's cheek, rather than an explicit measurement of force within the mouth, which a human observer cannot directly intuit.
Consider the example of the Bellagio fountains given in Section III-B -which clearly create different patterns in the water shot into the air based on the variable speed control used on the articulated cannons. We know that for humans a timescale of about 100ms is perceived to be instantaneous [25]. Thus, we can segment the states of the fountain machinery by 0.1s to approximate the dynamic states. Let's assume the Oarsman cannons have a maximum achievable angular velocity of 10 o /s or 100 o /100ms.
This assumption means that for each angle the articulated cannons can reach, there are 100 additional states, corresponding to the different velocities at which the cannons can arrive to each position, which if in the 'on' case where water is flowing, is visible to a human observer. Thus, we have log 2 (2 1383 × (160 × 100) 416 × 13 6200 ) (7) = log 2 (1.2 × 10 13137 configurations) ≈ 43, 640 bits. This is of course does not account for the dynamics of the system in a formal way, but it further quantifies the concept of expressivity -communicating information through movement -of robotic systems.
VI. CONCLUSION This paper has presented a notion of expressive movement that shifts from a trajectory-based measure to a complexityoriented one. In particular, we draw a metaphor between machines for mechanization and computation. In this way, the complexity of a mechanism on a machine describes its capability. Here, the measure has analyzed order-of-magnitude capabilities of existing robotic platforms.
This measure firstly reveals a great dearth in motion imitation of biological systems. Human motion in particular is a subject of robotics research and is often represented with 10s (order of magnitude) of degrees of freedom. More sophisticated models in animation make it to 1,000s and 10,000s of parameters [41,39]. Yet, it took 100 joint angles to parameterize the motion of a simple C. Elegan, which has only 302 neurons. Thus, through the perspective presented here, the analysis of biological movement may offer an inroad to analyzing the complexity of animals, their brains, and the ability to create robotic systems that can mimic them.
Further, this measure has been used to show a stagnant trend in robotics: while computational resources have been heaped onto platforms, mechanical capacity has stayed on the same order-of-magnitude. While computers have flourished due to the exponential growth of their capacity, robots have not focused on this area, favoring sophisticated control methods and computational decision models. Indeed, factories with many robotic manipulators, may be viewed as systems which begin to show growth on the y-axis of the plot in Figure 4.
Finally, the paper presents a formalization of the notion that the motion of robotic platforms can be used to communicate. Indeed many very low degree of freedom platforms have been used to do such a thing, with little regard to the fundamental capacity such platforms (with only translation and orientation at their disposal such as in the case of UAVs and UGVs) have for communication [36,17]. Note that these methods could also be used in multi-agent systems for robot-to-robot communication (although typically moving is more 'expensive' than communicating for power-constrained systems).
Ongoing work is investigating extensions of this paper in a few distinct directions. First, we are interest in how this can be used to formulate experiments in how humans perceive movement as in [21,32,25] in the context of robotics. Second, we are interested in how this method may inform architecture development, such as the use of motion primitives in teleoperation, where input versus output parameters of the system can be viewed as a compression channel (and, we hypothesize that good compression channels in this context will be better performing for users). Finally, we are investigating how computation and mechanization may be specified for a machine in a synergistic manner to improve performance.

ACKNOWLEDGMENTS
This work was funded by DARPA award #D16AP00001. Jialu Li and Varun Jain did essential work in helping to collect and estimate data on each platform reviewed. For computational configurations, the following values were used. Here, C is 2 x where x is the number of transistors. Indeed, often, another, larger computer (or cluster of processors) is networked to these machines through wireless or wired