How humans grasp three-dimensional objects

We rarely experience difficulty picking up objects, yet of all potential grasp points on an object’s surface, only a small proportion yield stable, comfortable grasps. Here, we present extensive behavioral data alongside a computational model that correctly predicts human precision grasping of unfamiliar 3D objects. We tracked participants’ forefinger and thumb as they picked up objects of 10 wood and brass cubes configured to tease apart effects of shape, weight, orientation, and mass distribution. Grasps were highly systematic and consistent across repetitions and participants. The model combines five cost functions related to force closure, torque, natural grasp axis, grasp aperture, and visibility. Even without free parameters, we find that the model predicts human grasps with striking fidelity: indeed, it predicts individual grasps almost as well as different individuals predict one another’s. Adding fittable weights to the model reveals the relative importance of the different constraints: the combination of force closure, hand posture, and grasp size explains most of human grasping behavior, while our participants cared surprisingly little about minimizing torque and optimizing object visibility. Together, these findings provide a unified account of how we derive effective grasps from objects’ 3D shape and material properties to interact with them successfully. Significance Statement Working out how we pick up and interact with objects effectively is one of the most important challenges in behavioral science. Of all the potential contact points on an object’s surface, only a small proportion yield effective grasps. Despite this, we rarely experience any difficulty choosing where and how to pick objects up. Here, we present a computational model that unifies the varied and fragmented literature on human grasp selection. We find that the model correctly predicts human grasps across a wide variety of conditions, taking into account the object’s 3D shape, material properties and orientation.


27
In everyday life, we effortlessly grasp and pick up objects without much thought. 28 However, this ease belies its computational complexity. Even state of the art robotic AIs fail to 29 grip objects nearly 20% of the time(1). To pick something up, our brains must work out which 30 locations on the object will lead to stable, comfortable grasps, so we can perform desired 31 actions (Figure 1a). Most potential grasps would actually be unsuccessful, e.g., requiring thumb 32 and forefinger to cross, or failing to exert useful forces (Figure 1b). Even many possible grasps 33 would be unstable, e.g., grasping an object too far from its center, so that it rotates once we try 34 to lift it ( Figure 1c). Somehow, the brain must infer which, of all potential grasps, would actually 35 succeed. Despite this, we rarely drop objects or find ourselves unable to complete actions 36 because we are holding them inappropriately. How does the brain select stable, comfortable 37 grasps onto arbitrary 3D objects, particularly objects we have never seen before?  towards the object CoM for grasps executed onto light (Experiment 1) and heavy (Experiment 2) 125

objects compared to grasps uniformly distributed on the object surfaces (zero reference). (h) 126
Human grasps from Experiment 2 onto object S presented at orientation 2. (i) Attraction towards 127 the object CoM compared to Experiment 1 grasps (zero reference), for Experiment 2 grasps 128 onto heavy objects whose CoM is closer, the same distance as, or farther than the light wooden 129 objects from Experiment 1. In all panels, error bars/regions represent 95% bootstrapped 130 confidence intervals. ** p<0.01, *** p<0.001 131 132 To further quantify how clustered these grasping patterns are we designed a simple 133 metric of similarity between grasps (see Methods). Figure 3d shows how both between-and 134 within-subject grasp similarity are significantly higher than the similarity between random grasps 135 due to object geometry (t(7)=9.76, p=2.5*10 -5 and t(7)=25.11, p=4.1*10 -8 respectively). 136 Additionally, within-subject grasp similarity is significantly higher than between subjects 137 (t(7)=3.89, p=0.0060). Nevertheless, the high level of similarity between grasps from different 138 participants demonstrates that different humans tend to grasp objects in similar ways. The even 139 higher level of within-subject grasp similarity further demonstrates that grasp patterns from 140 individual participants are idiosyncratic, which may reflect differences in the strategies employed 141 by individual participants.

Findings reproduce several known effects in grasp selection. First, previous research 143
suggests haptic space is encoded in both egocentric and allocentric coordinates(28), and that 144 grasps are at least partly encoded in egocentric coordinates to account for the biomechanical 145 constraints of our arm and hand(16). Our findings reproduce and extend these observations. 146 For each object we computed grasp similarity across the two orientations in both egocentric and 147 allocentric coordinates. Figure 3e shows that, as the extent of the object rotation increases, 148 grasp encoding shifts from allocentric to egocentric coordinates. Across small rotations (object 149 S, 55 deg rotation), grasps are more similar if encoded in allocentric coordinates (t(11)=13.90, 150 p=2.5*10 -8 ), whereas for large rotations (object L, 180 degrees) grasps are more similar if 151 encoded in egocentric coordinates (t(11)=4.59, p= 7.8*10 -4 ). Therefore, both 3D shape as well 152 as movement constraints influence grasps. 153 Second, Figure 3f shows that participants selected grasps locations that were on 154 average 26 mm closer to the starting location than the object centroid (t(11)=9.74, p=9.6*10 -7 ), 155 reproducing known spatial biases in human grasp selection (12,25,27,29,30 sought to minimize torque, the selected grasps should be as close as possible to the CoM. In 159 contrast, Figure 3g shows that for the light weight objects in Experiment 1, grasps were on 160 average 9 mm farther from the CoM than the average distance to the object's CoM of grasps 161 uniformly sampled onto the surface of the objects (t(11)=4.53, p=8.6*10 -4 ). 162 163

Experiment 2: Mass and Mass Distribution 164
Humans grasp objects close their center of mass when high grip torques are possible. 165 Due to the low density of beech wood, even the grasps farthest from the CoM in Experiment 1 166 would produce relatively low torques. Therefore, in Experiment 2 we tested whether participants 167 grasp objects closer to the CoM when higher torques are possible. We did this by using objects of greater mass and asymmetric mass distributions. Specifically, for each of the shapes in 169 Experiment 1, we made three new objects, each made of five brass and five wooden cubes: two 170 'bipartite' objects, with brass clustered on one or the other half of the object, and one 171 'alternating' object, with brass and wood alternating along the object's length. These objects had 172 the same 3D shapes as in Experiment 1, but were nearly tenfold heavier (Figure 2c, see 173 Methods). 174 Figure 3g shows how human grasps are indeed significantly attracted towards the CoM 175 of heavy objects, presumably to counteract the larger torques associated with higher mass. In 176 Experiment 2, grasps were on average 11 mm closer to the object CoM than grasps sampled 177 uniformly from the objects' surfaces (t(13)=4.94, p= 2.7*10 -4 ), and on average 20 mm closer 178 than the grasps from Experiment 1 (t(24)=6.63, p= 7.4*10 -7 ). Importantly, participants shifted 179 their grasps towards the CoM-not the geometrical centroid-of the objects (observe how the 180 grasp patterns shift in Figure 3h). Figure 3i shows that when the object CoM was shifted 181 towards the hand starting location, participants did not significantly adjust their grasping strategy 182 compared to Experiment 1 (t(13)=0.81, p=0.43). Conversely, when the object CoM was in the 183 same position as in Experiment 1, participants shifted their grasps on average by 8 mm towards 184 the CoM (t(13)=3.92, p=0.0017). When the object CoM was shifted away from the hand starting 185 position, participant grasps were on average 37 mm closer to the object CoM compared to 186 Experiment 1 grasps (t(13)=8.49, p=1.2*10 -6 ), a significantly greater shift than both the near and 187 same CoM conditions (t(13)=8.66, p=9.2*10 -7 and t(13)=7.58, p=4.0*10 -6 ). These differential 188 shifts indicate that participants explicitly estimated each object's CoM from visual material cues. 189 Even with the heavier objects, participants still systematically selected grasp locations 190 that were closer to the starting location than the object centroid (t(13)=4.03, p=0.0014). 191 However, now participants exhibited only a 9 mm bias, which was significantly smaller than the 192 Together these findings suggest that participants combine multiple constraints to select 194 grasp locations, taking into consideration the shape, weight, orientation, and mass distribution of 195 objects, as well as properties of their own body to decide where to grasp objects. We next 196 sought to develop a unifying model that could predict these diverse effects based on a few  ). We reasoned that humans would tend to grasp objects at or close to the 206 minima of this cost function, as these would yield the most stable, comfortable grasps. Low cost 207 grasps can then be projected back onto the object to compare against human grasps. It is 208 important to note that this is not intended as a process model describing internal visual or motor 209 representations (i.e., we do not suggest that the human brain explicitly evaluates grasp cost for 210 all possible surface locations). Rather, the model is a way of combining a subset of the factors 211 which are known to influence human grasp selection into a single, unifying framework (12). 212 For each object, we create a triangulated mesh model in a 3D coordinate frame, from 213 which we can sample (Figure 4a  Minimum torque: grasping an object far from its CoM results in high torque, which causes the 240 object to rotate when picked up(15-19). Large gripping forces would be required to prevent the 241 object from rotating. We therefore penalize torque magnitude. 242 Natural grasp axis: when executing precision grip grasps, humans exhibit a preferred hand 243 posture known as the natural grasp axis(16, 20-22). Grasps that are rotated away from this axis 244 result in uncomfortable or restrictive hand/arm configurations. We therefore penalize angular 245 misalignment between each candidate grasp and the natural grasp axis (taken from (21)). 246 Unlike force closure and torque, this penalty map is asymmetric about the diagonal. 247 Optimal grasp aperture: humans prefer the distance between finger and thumb at contact 248 ('grasp aperture') to be below 2.5 cm(23). We therefore penalize grasp apertures above 2.5 cm. 249 Optimal visibility: our behavioral data, and previous studies, suggest humans exhibit spatial 250 biases when grasping. It has been proposed that these may arise from an attempt to minimize 251 energy expenditures through shorter reach movements(24). However, Paulun et al (25) have 252 shown that these biases may in fact arise from participants attempting to optimize object 253 visibility. While our current dataset was not designed to untangle these competing hypotheses, 254 re-analyzing published data (19, 27) confirms that object visibility-not reach length-is most 255 likely responsible for the biases. We therefore penalized grasps that hindered object visibility. 256 We also designed a penalty function for reach length and verified that, since reach length and 257 object visibility are correlated in our dataset, employing one or the other penalty function yields 258 very similar results. 259 We assume that participants select grasps with low overall costs across all penalty 261 functions. Thus, to create the overall grasp penalty function, we take a (weighted) linear sum of 262 the individual penalty maps. The minima of this full penalty map represent grasps that best 263 satisfy all criteria simultaneously. The map in Figure 5d exhibits a clear minimum: the white 264 region in its lower right quadrant. 265 To assess the agreement between human and optimal grasps, we may visualize human 266 grasps in the 2D representation of the grasp manifold. The red markers in Figure 5 differently (e.g., due to strength or hand size). Therefore, we developed a method for fitting full 273 penalty maps to participants' responses. We assigned variable weights to each optimality 274 criterion, and fit these weights to the grasping data from each participant, to obtain a set of full 275 penalty maps whose minima best align with each participant's grasps (see Methods). 276  Unfitted model grasps were significantly more similar to human grasps than chance 302 (t(31)=10.79, p=5.0*10 -12 ), and effectively indistinguishable from human-level grasps similarity 303 (t(31)=0.31, p=0.76). Note that this does not mean our current approach perfectly describes 304 human grasping patterns; it suggests instead that our framework is able to predict the median 305 human grasping patterns nearly as well as the grasps of a random human on average 306 approximate the median human grasp. 307 Analyzing the patterns of fitted weights confirms our empirical findings. The model also 327 replicates our main empirical findings in a single step. Figure 5e shows that the relative 328 importance of torque was much greater for the heavy objects tested in Experiment 2 compared 329 to the light objects from Experiment 1 (t(24)=4.40, p=1.9*10 -4 ). Conversely, Figure 5f shows that 330 the relative importance of object visibility instead decreased significantly from Experiment 1 to 331 Experiment 2 (t(24)=3.07, p=0.0053). Additionally, by simulating grasps from the fitted model, 332 we are able to recreate the qualitative patterns of all behavioral results presented in Figure 3  We investigated how an object's 3D shape, orientation, mass, and mass distribution jointly 338 influence how humans select grasps. Our empirical analyses showed that grasping patterns are 339 highly systematic, both within and across participants, suggesting that a common set of rules 340 governs human grasp selection of complex, novel 3D objects. Our findings reproduce, unify, 341 and generalize many effects observed previously: (1) both 3D shape and orientation determine 342 which portion of the object people grasp (8, 12, 15, 16, 31-34).; (2) humans exhibit spatial 343 biases even with complex 3D objects varying in shape and mass(12, 25, 27, 29, 30); (3) object 344 weight modulates how much humans take torque into account when selecting where to grasp 345 objects(15-19). We then combined this diverse set of observations into a unified theoretical 346 framework that predicts human grasping patterns strikingly well, even with no free parameters. 347 By fitting the computational model to human behavioral data, we showed that force closure, 348 hand posture, and grasp size are the primary determinants of human grasp selection, whereas 349 torque and visibility modulate grasping behavior to a much lesser extent. influences grasping has primarily focused on hand kinematics (15, 19, 32, 35). Conversely, with 361 more complex 3D shapes we show that the same portion of an object is selected within a range of orientations relative to the observer, whereas for more extreme rotations the grasp selection 363 strategy shifts significantly. Therefore, object shape and orientation together determine which 364 portion of an object will be grasped, and thus the final hand configuration. 365

Spatial Biases
The spatial biases we observe are consistent with participants attempting to 366 increase object visibility(25, 27), and our data also replicate the finding that these biases are 367 reduced when object weight increases (19, 25). distance to CoM was modulated by object weight and material. Our findings resolve these 374 conflicting findings. By using stimuli that decorrelate different aspects of grasp planning, we find 375 that shape and hand configuration are considerably more important than torque for light weight 376 objects, and that the importance of minimizing torque scales with mass. Additionally, shifting an 377 object's mass distribution significantly attracted grasp locations towards the object's shifted 378 CoM, demonstrating that participants could reliably combine global object shape and material 379 composition to successfully infer the object's CoM. 380 Computational Modelling Previous models of grasping have mainly focused on hand 381 kinematics and trajectory synthesis(2-6) whereas we attempt to predict which object locations 382 will be selected during grasping. Our modelling approach takes inspiration from 383 Kleinholdermann et al(12), which to the best of our knowledge is the only previous model of 384 human two-digit contact point selection, but only for 2D shape silhouettes. In addition to dealing 385 with 3D objects varying in mass, mass distribution, orientation, and position, our modeling 386 addresses several limitations of previous approaches. The fitting procedure quantifies the 387 relative importance of different constraints, and can be applied to any set of novel objects to test how experimental manipulations affect this relative weighting. The modular nature of the model 389 allows additional constraints to be included, excluded or given variable importance. For 390 example, we know that end-state comfort of the hand plays a role in grip selection(36, 37), yet 391 the tradeoff between initial and final comfort is unclear(38). By varying the participants' task to 392 include object rotations, and by including a penalty function penalizing final hand rotations away 393 from the natural grasp axis, it would be possible to assess the relative importance of initial, final 394 with AIP, which has been shown to represent the shape, size, and orientation of 3D objects, as 411 well as the shape of the handgrip, grip size, and hand-orientation(46). Additionally, visual 412 material properties, including object weight, are thought to be encoded in the ventral visual 413 cortex(47-51), and it has been suggested that AIP might play a unique role in linking components of the ventral visual stream involved in object recognition to hand motor 415 system(52). Therefore, the neural circuit formed between F5, F2, and particularly AIP is a strong 416 candidate for combining the multifaceted components of visually guided grasping identified in 417 this work(53-57). Combining targeted investigations of brain activity with the behavioral and 418 modelling framework presented here holds the potential to develop a unified theory of visually 419 guided grasp selection.  from Experiment 1, we created 3 new objects (12 in total) to serve as stimuli for Experiment 2 457 (Figure 2c). Individual cubes were made of either wood or brass. The objects were composed of 458 5 cubes of each material, which made them fairly heavy with a mass of 716g. By reordering the 459 sequence of wood and brass cubes, we shifted the location of each shape's CoM. For each 460 shape we made one object in which brass and wooden cubes alternated with one another, and 461 two bipartite objects, where the 5 brass cubes were connected to one another to make up one 462 side of the object with the wooden cubes making up the other side. This configuration was also 463 Object meshes. Triangulated mesh replicas of all objects were created in Matlab; each cube 466 face consisted of 128 triangles. To calibrate mesh orientation and position, we measured, using 467 the Optotrack, four non planar points on each object at each orientation. We aligned the model 468 to the same coordinate frame employed by the Optotrack using Procrustes analysis. 469

Procedure 470
Prior to each trial, participants placed thumb and index finger at a pre-specified starting location. 471 In Experiment 1, two start locations were used (start 1 at 28 cm to the right of the chinrest in the 472 participant's coronal plane and 9.5 cm forward in the sagittal plane; start 2 9 cm further to the 473 right and 3 cm further forward, 23 cm from the center of the goal plate). Given that we observed 474 Following each trial, the experimenter visually inspected the movement traces to determine 494 whether a grasp was successful or not. Grasps were deemed unsuccessful when the movement 495 was too slow, when an object was dropped, or when tracking was lost. Unsuccessful grasps 496 were marked as error trials, added to the randomization queue, and repeated. A total of 368 497 error trials (13.8% of trials from Experiment 1 and 13.9% from Experiment 2) were not analyzed. 498

Training 499
Each participant completed six practice trials (using a Styrofoam cylinder in Experiment 1, and 500 by lifting random objects from the shapes not used in that participant's run in Experiment 2) prior 501 to the experiment to give them a sense for how fast their movement should be in order to 502 complete the entire movement within three seconds. Practice trial data were not used in 503 analyses. Prior to Experiment 2, participants were familiarized with the relative weight of brass 504 and wood using two rectangular cuboids of dimensions 12.5x2.5x2.5 cm, one of wood (50 g) 505 and one of brass (670 g). 506 Analyses 507 All analyses were performed in Matlab version R2018a. Differences between group means were 508 assessed via paired or unpaired t-tests, as appropriate. Values of p<0.05 were considered 509 statistically significant. 510 Contact points. Contact points of both fingers with the object were determined as the fingertip 511 coordinates at the time of first contact, projected onto the surface of the triangulated mesh 512 models of the object. The time of contact with the object was determined using the methods 513 developed by Schot et al (60) and previously described in Paulun et al (19). 514 Grasp similarity. We described each individual grasp ⃗⃗ as a 6D vector of the x,y,z coordinates 515 of the thumb and index finger contact points: To compute the similarity between two grasps ⃗⃗⃗⃗ and ⃗⃗⃗⃗ , we first computed the Euclidian 518 distance between the two 6D grasp vectors. We then divided this distance by the largest 519 possible distance between two points on the specific object , determined from the mesh 520 models of the objects. Finally, similarity was defined as 1 minus the normalized grasp distance, 521 times 100: 522 In this formulation, two identical grasps, which occupy the same point in a 6D space, will be 524 100% similar, whereas the two farthest possible grasps onto a specific object will be 0% similar. Torque. If a force is applied at some position away from the CoM, the object will tend to rotate 548 due to torque, given by the cross product of force vector and lever arm (the vector connecting 549 CoM to the point of force application). Under the assumption that is possible to apply forces at 550 the thumb and index contact points that counteract the force of gravity ⃗⃗⃗⃗⃗ , we can compute the 551 total torque of a grip as the sum of torques exerted by each contact point. Therefore, we defined 552 the torque penalty function as the magnitude of the total torque exerted by a grip: Optimal grasp aperture for precision grip. Cesari and Newell(23) have shown that, when free 563 to employ any multi-digit grasp, human participants selected precision grip grasps only for 564 cubes smaller than 2.5 cm in length. As cube size increases, humans progressively increase the number of digits employed in a grasps. Therefore, since our participants were instructed only to 566 employ precision grip grasps, we defined the optimal grasp aperture penalty function as 0 for 567 grasp sizes smaller than 2.5 cm, and as a linearly increasing penalty for grasp sizes larger than Predicting Grasps. The minima of both the equally weighted (non-fitted) and the fitted overall 610 grasp penalty functions represent the set of grasps predicted to be optimal under the weighted 611 linear combination of the five penalty functions included in our computational model. To 612 visualize these predicted optimal grasps, we sampled them from the minima of the penalty 613 functions. First, we removed all grasps with penalty values greater than the lower 0.1th 614 percentile. The remaining grasps were therefore all optimal or near-optimal. From this subset, 615 we then randomly selected (with replacement) a number of grasps equal to the number of 616 grasps executed by the human participants. The probability with which any one grasp was 617 selected was set to be 1 minus the grasp penalty, thus grasps with zero penalty had the highest 618 probability of being selected. These sampled grasps can then be projected back onto the 619 objects for visualization purposes (Figure 5a), or they can be directly compared to human grasp 620 using the grasp similarity metric described above (Figure 5b,c). patterns. 822