Multi-scale temporal feature fusion for few-shot action recognition

2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP（2023）

引用 0|浏览0

暂无评分

摘要

The aim of this paper is to recognize actions of interest that are given by a few support videos in testing (query) videos. The focus of our approach is to develop a novel temporal enrichment module where the features describing local temporal contexts in videos are enhanced by collaboratively merging important information in frame-level (no temporal context) features. We call this module a multi-scale temporal feature fusion (MSTFF) module. Utilizing multiple MSTFF modules varying the scope of local temporal context extraction, we can obtain discriminative video representation which is crucial in the few-shot tasks where support videos are not sufficient to describe an action class. For stable learning of a model with MSTFF and the performance boost, we also learn a local temporal context-level auxiliary classifier in parallel with the main classifier. We analyze the proposed components to demonstrate their importance. We achieve state-of-the-art on three few-shot action recognition benchmarks: Something-Something V2 (SSv2), HMDB51, and Kinetics.

查看译文

关键词

Few-shot learning,Few-shot action,video representation,temporal fusion,cross-attention

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要