Egocentric Gesture Recognition Using Recurrent 3D Convolutional Neural Networks with Spatiotemporal Transformer Modules

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)(2017)

引用 105|浏览68
暂无评分
摘要
Gesture is a natural interface in interacting with wearable devices such as VR/AR helmet and glasses. The main challenge of gesture recognition in egocentric vision arises from the global camera motion caused by the spontaneous head movement of the device wearer. In this paper, we address the problem by a novel recurrent 3D convolutional neural network for end-to-end learning. We specially design a spatiotemporal transformer module with recurrent connections between neighboring time slices which can actively transform a 3D feature map into a canonical view in both spatial and temporal dimensions. To validate our method, we introduce a new dataset with sufficient size, variation and reality, which contains 83 gestures designed for interaction with wearable devices, and more than 24,000 RGB-D gesture samples from 50 subjects captured in 6 scenes. On this dataset, we show that the proposed network outperforms competing state-of-the-art algorithms. Moreover, our method can achieve state-of-the-art performance on the challenging GTEA egocentric action dataset.
更多
查看译文
关键词
egocentric vision,global camera motion,spontaneous head movement,end-to-end learning,spatiotemporal transformer module,recurrent connections,neighboring time slices,3D feature map,spatial dimensions,temporal dimensions,wearable devices,egocentric gesture recognition,recurrent 3D convolutional neural networks,natural interface,recurrent 3D convolutional neural network,GTEA egocentric action dataset,VR-AR helmet,VR-AR glasses,RGB-D gesture samples
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要