Multi-scale temporal feature fusion for few-shot action recognition

2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP(2023)

引用 0|浏览0
暂无评分
摘要
The aim of this paper is to recognize actions of interest that are given by a few support videos in testing (query) videos. The focus of our approach is to develop a novel temporal enrichment module where the features describing local temporal contexts in videos are enhanced by collaboratively merging important information in frame-level (no temporal context) features. We call this module a multi-scale temporal feature fusion (MSTFF) module. Utilizing multiple MSTFF modules varying the scope of local temporal context extraction, we can obtain discriminative video representation which is crucial in the few-shot tasks where support videos are not sufficient to describe an action class. For stable learning of a model with MSTFF and the performance boost, we also learn a local temporal context-level auxiliary classifier in parallel with the main classifier. We analyze the proposed components to demonstrate their importance. We achieve state-of-the-art on three few-shot action recognition benchmarks: Something-Something V2 (SSv2), HMDB51, and Kinetics.
更多
查看译文
关键词
Few-shot learning,Few-shot action,video representation,temporal fusion,cross-attention
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要