Video Playback Rate Perception For Self-Supervised Spatio-Temporal Representation Learning

Yuan Yao,Chang Liu,Dezhao Luo,Yu Zhou,Qixiang Ye

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)（2020）

引用 187|浏览227

暂无评分

摘要

In self-supervised spatio-temporal representation learning, the temporal resolution and long-short term characteristics are not yet fully explored, which limits representation capabilities of learned models. In this paper, we propose a novel self-supervised method, referred to as video Playback Rate Perception (PRP), to learn spatio-temporal representation in a simple-yet-effective way. PRP roots in a dilated sampling strategy, which produces self-supervision signals about video playback rates for representation model learning. PRP is implemented with a feature encoder, a classification module, and a reconstructing decoder, to achieve spatio-temporal semantic retention in a collaborative discrimination-generation manner. The discriminative perception model follows a feature encoder to prefer perceiving low temporal resolution and long-term representation by classifying fast-forward rates. The generative perception model acts as a feature decoder to focus on comprehending high temporal resolution and short-term representation by introducing a motion-attention mechanism. PRP is applied on typical video target tasks including action recognition and video retrieval. Experiments show that PRP outperforms state-of-the-art self-supervised models with significant margins. Code is available at github.com/yuanyao366/PRP.

查看译文

关键词

state-of-the-art self-supervised models,self-supervised spatio-temporal representation learning,long-short term characteristics,representation capabilities,self-supervised method,PRP,self-supervision signals,video playback rates,representation model,feature encoder,spatio-temporal semantic retention,discriminative perception model,low temporal resolution,long-term representation,generative perception model,high temporal resolution,short-term representation,video playback rate perception

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要