Attentive Task-Net - Self Supervised Task-Attention Network for Imitation Learning using Video Demonstration.

ICRA(2020)

引用 8|浏览23
暂无评分
摘要
This paper proposes an end-to-end self-supervised feature representation network named Attentive Task-Net or AT-Net for video-based task imitation. The proposed AT-Net incorporates a novel multi-level spatial attention module to highlight spatial features corresponding to the intended task demonstrated by the expert. The neural connections in AT-Net ensure the relevant information in the demonstration is amplified and the irrelevant information is suppressed while learning task-specific feature embeddings. This is achieved by a weighted combination of multiple intermediate feature maps of the input image at different stages of the CNN pipeline. The weights of the combination are given by the compatibility scores, predicted by the attention module for respective feature maps. The AT-Net is trained using a metric learning loss which aims to decrease the distance between the feature representations of concurrent frames from multiple view points and increase the distance between temporally consecutive frames. The AT-Net features are then used to formulate a reinforcement learning problem for task imitation. Through experiments on the publicly available Multi-view pouring dataset, it is demonstrated that the output of the attention module highlights the task-specific objects while suppressing the rest of the background. The efficacy of the proposed method is further validated by qualitative and quantitative comparison with a state-of-the-art technique along with intensive ablation studies. The proposed method is implemented to imitate a pouring task where an RL agent is learned with the AT-Net in Gazebo simulator. Our findings show that the AT-Net achieves 6.5% decrease in alignment error along with a reduction in the number of training iterations by almost 155k over the state-of-the-art while satisfactorily imitating the intended task.
更多
查看译文
关键词
task-specific objects,intended task,imitation learning,video demonstration,end-to-end self-supervised feature representation network,video-based task imitation,multilevel spatial attention module,spatial features,weighted combination,multiple intermediate feature maps,respective feature maps,metric learning loss,multiple view points,AT-Net features,reinforcement learning problem,attentive task-net,self supervised task-attention network,neural connections,learning task-specific feature embeddings,temporally consecutive frames,publicly available multiview pouring dataset,RL agent,Gazebo simulator,CNN pipeline
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要