Grasping rigid objects has been reasonably studied under a wide variety of settings. The common measure of success is a check of the robot to hold an object for a few seconds. This is not enough. To obtain a deeper understanding of object manipulation, we propose (1) a task-oriented part-based modelling of grasping and (2) BURG - our castle of setups, tools and metrics for community building around an objective benchmark protocol.
The idea is to boost grasping research by focusing on complete tasks. This calls for attention on object parts since they are essential to know how and where the gripper can grasp given the manipulation constraints imposed by the task. Moreover, parts facilitate knowledge transfer to novel objects, across different sources (virtual/real data) and grippers, providing for a versatile and scalable system. The part-based approach naturally extends to deformable objects for which the recognition of relevant semantic parts, regardless of the object actual deformation, is essential to get a tractable manipulation problem. Finally, by focusing on parts we can deal easier with environmental constraints that are detected and used to facilitate grasping.
Regarding benchmarking of manipulation, so far robotics suffered from incomparable grasping and manipulation work. Datasets cover only the object detection aspect. Object sets are difficult to get, not extendible, and neither scenes nor manipulation tasks are replicable. There are no common tools to solve the basic needs of setting up replicable scenes or reliably estimate object pose.
Hence, with the BURG benchmark we propose to focus on community building through enabling and sharing tools for reproducible performance evaluation, including collecting data and feedback from different laboratories for studying manipulation across different robot embodiments. We will develop a set of repeatable scenarios spanning different levels of quantifiable complexity that involve the choice of the objects, tasks and environments. Examples include fully quantified settings with layers of objects, adding deformable objects and environmental constraints. The benchmark will include metrics defined to assess the performance of both low-level primitives (object pose, grasp point and type, collision-free motion) as well as manipulation tasks (stacking, aligning, assembling, packing, handover, folding) requiring ordering as well as common sense knowledge for semantic reasoning.