Translating Videos to Commands for Robotic Manipulation with Deep Recurrent Neural Networks

2018 IEEE International Conference on Robotics and Automation (ICRA)(2017)

引用 66|浏览51
暂无评分
摘要
We present a new method to translate videos to commands for robotic manipulation using Deep Recurrent Neural Networks (RNN). Our framework first extracts deep features from the input video frames with a deep Convolutional Neural Networks (CNN). Two RNN layers with an encoder-decoder architecture are then used to encode the visual features and sequentially generate the output words as the command. We demonstrate that the translation accuracy can be improved by allowing a smooth transaction between two RNN layers and using the state-of-the-art feature extractor. The experimental results on our new challenging dataset show that our approach outperforms recent methods by a fair margin. Furthermore, we combine the proposed translation module with the vision and planning system to let a robot perform various manipulation tasks. Finally, we demonstrate the effectiveness of our framework on a full-size humanoid robot WALK-MAN.
更多
查看译文
关键词
feature extraction,video translation,CNN,RNN,full-size humanoid robot WALK-MAN,manipulation tasks,translation module,visual features,encoder-decoder architecture,RNN layers,deep Convolutional Neural Networks,input video frames,deep features,command,Deep Recurrent Neural Networks,robotic manipulation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要