Learning to Push by Grasping: Using Multiple Tasks for Effective Learning
Currently, end-to-end learning frameworks are becoming popular in the field of robotic control. These frameworks take states/images as direct input and output predicted torque and action parameters. However, they have been criticized for their high data demands, sparking discussions about their scalability. Specifically, does end-to-end learning require a separate model for each task? Intuitively, sharing between tasks is beneficial because they require some common understanding of the environment. This paper explores the next step in data-driven end-to-end learning frameworks, moving from task-specific models to joint models for multiple robotic tasks, yielding surprising results: multi-task learning outperforms single-task learning with the same amount of data. For example, in the grasp task, a model trained with 2.5k grasp data and 2.5k push data performs better than a model trained with 5k grasp data alone.