Gfnet: A Lightweight Group Frame Network For Efficient Human Action Recognition

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING(2020)

引用 6|浏览20
暂无评分
摘要
Human action recognition aims at assigning an action label to a well-segmented video. Recent work using two-stream or 3D convolutional neural networks achieves high recognition rates at the cost of huge computation complexity, memory footprint, and parameters. In this paper, we propose a lightweight neural network called Group Frame Network (GFNet) for human action recognition, which imposes intra-frame spatial information sparsity on spatial dimension in a simple yet effective way. Benefit from two core components, namely Group Temporal Module (GTM) and Group Spatial Module (GSM), GFNet decreases irrelevant motion inside frames and duplicate texture features among frames, which can extract the spatial-temporal information of frames at a minuscule cost. Experimental results on NTU RGB+D dataset and Varying-view RGB-D Action dataset show that our method without any pre-training strategy reaches a reasonable trade-off among computation complexity, parameters and performance, which is more cost-efficient than state-of-the-art methods.
更多
查看译文
关键词
Human Action Recognition, Lightweight Network, Convolutional Neural Network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要