Contrastive Learning for Unsupervised Video Highlight Detection

IEEE Conference on Computer Vision and Pattern Recognition(2022)

引用 15|浏览33
暂无评分
摘要
Video highlight detection can greatly simplify video browsing, potentially paving the way for a wide range of ap-plications. Existing efforts are mostly fully-supervised, requiring humans to manually identify and label the interesting moments (called highlights) in a video. Recent weakly supervised methods forgo the use of highlight annotations, but typically require extensive efforts in collecting external data such as web-crawled videos for model learning. This observation has inspired us to consider unsupervised highlight detection where neither frame-level nor video-level annotations are available in training. We propose a simple contrastive learning framework for unsupervised highlight detection. Our framework encodes a video into a vector representation by learning to pick video clips that help to distinguish it from other videos via a contrastive objective using dropout noise. This inherently allows our framework to identify video clips corresponding to highlight of the video. Extensive empirical evaluations on three highlight detection benchmarks demonstrate the superior performance of our approach.
更多
查看译文
关键词
Video analysis and understanding, Recognition: detection,categorization,retrieval, Representation learning, Self-& semi-& meta- Vision applications and systems
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要