Learning to Represent Spatio-Temporal Features for Fine Grained Action Recognition

2018 IEEE International Conference on Image Processing, Applications and Systems (IPAS)(2018)

引用 1|浏览22
暂无评分
摘要
Convolutional neural networks have pushed the boundaries of action recognition in videos, especially with the introduction of 3D convolutions. But it is an open ended question on how efficiently a 3D CNN can model temporal information? which we try to investigate and introduce a new optical flow representation to improve the motion stream. We use the baseline inflated 3D CNN networks and separate the convolutional filters into spatial and temporal, which reduces the number of parameters with minimal loss of accuracy. We evaluate our approach on NTU RGBD dataset which is the largest human action dataset and outperform the state-of-the-art by a large margin.
更多
查看译文
关键词
action recognition,3D convolutions,optical flow
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要