Learning Generalizable Vision-Tactile Robotic Grasping Strategy for Deformable Objects Via Transformer
IEEE-ASME TRANSACTIONS ON MECHATRONICS(2024)
Abstract
Reliable robotic grasping, especially with deformable objects such as fruits, remains a challenging task due to underactuated contact interactions with a gripper, unknown object dynamics and geometries. In this study, we propose a transformer-based robotic grasping framework for rigid grippers that leverage tactile and visual information for safe object grasping. Specifically, the transformer models learn physical feature embeddings with sensor feedback through performing two predefined explorative actions (pinching and sliding) and predict a grasping outcome through a multilayer perceptron with a given grasping strength. Using these predictions, the gripper predicts a safe grasping strength via inference. Compared with convolutional recurrent networks, the transformer models can capture the long-term dependencies across the image sequences and process spatial-temporal features simultaneously. We first benchmark the transformer models on a public dataset for slip detection. Following that, we show that the transformer models outperform a CNN + LSTM model in terms of grasping accuracy and computational efficiency. We also collect a new fruit grasping dataset and conduct online grasping experiments using the proposed framework for both seen and unseen fruits. In addition, we extend our model to objects with different shapes and demonstrate the effectiveness of our pretrained model trained on our large-scale fruit dataset.
MoreTranslated text
Key words
Deep learning,perception for grasping and manipulation,visual and tactile sensing
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined