Temporal grafter network: Rethinking LSTM for effective video recognition

Bingbing Zhang,Qilong Wang,Zilin Gao,Ruiren Zeng,Peihua Li

Neurocomputing（2022）

引用 1|浏览11

暂无评分

摘要

Long short-term memory (LSTM) networks are widely used to handle temporal or sequential data, and have great potential for video recognition. Existing LSTM-based video recognition methods either insert LSTM modules at the end of 2D convolutional neural networks (CNNs), called global LSTM methods, or build networks solely by stacking multiple LSTM modules. Unfortunately, these LSTM-based video recognition methods are not competitive, compared to state-of-the-art 3D CNNs or two-stream CNNs. In order to fully explore the potential of LSTM, this paper rethinks its role in video recognition network architectures and proposes a novel Temporal Grafter Network (TGN). Specifically, we develop an efficient and effective variant of convolutional LSTM module, which is grafted between different stages of very deep 2D CNNs for temporal modeling and delivery. Our TGN can capture local motion patterns of varying scales inherent in feature maps from high to low resolutions, while attending to the spatial context information and modeling global temporal dependency across the whole video. The proposed TGN can capture and transmit temporal information throughout very deep 2D CNNs, overcoming the downsides of existing LSTM-based methods and being able to make full use of the potential of LSTM for effective video recognition and early action recognition. We perform extensive ablation study to verify the effectiveness of our proposed methods, and experiments on three widely used video benchmarks show that our methods can achieve performance matching or better than the state-of-the-arts.

查看译文

关键词

Long short-term memory,Deep convolutional neural networks,Temporal grafter network,Video recognition

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要