Online Video Summarization: Predicting Future to Better Summarize Present

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)(2019)

引用 18|浏览13
暂无评分
摘要
Automatically generating the summary of a video is a challenging problem due to its subjective nature. Most of the previous works in the field consider the entire video to extract out the important frames. Unlike them, our paper presents MerryGoRoundNet, a supervised learning approach to solve this problem in an online fashion. We observe that to effectively summarize a video, one needs to take into account both the spatial and temporal relations between video frames. MerryGoRoundNet utilizes encoder-decoder style architecture and convolutional LSTM to establish spatiotemporal relationship and generates the summary on the fly, thereby being more efficient than non-autoregressive counterparts in terms of time and memory. In order to make summary more diverse and complete, we augment our network with unsupervised task of next frame prediction and a supervised task of scene start detection and propose a loss function that explicitly focuses on achieving the right balance between continuity and diversity in the produced summary. Ablation study performed affirms the architecture and learning objective of our approach. Evaluation of MerryGoRoundNet on different datasets demonstrates superior performance among online summarization approaches and competitive performance when compared with offline approaches as well.
更多
查看译文
关键词
Streaming media,Task analysis,Decoding,Convolutional codes,Feature extraction,Neural networks,Cost function
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要