Global and Compact Video Context Embedding for Video Semantic Segmentation

Lei Sun,Yun Liu,Guolei Sun, Min Wu,Zhijie Xu,Kaiwei Wang,Luc Van Gool

IEEE Access（2024）

引用 0|浏览20

暂无评分

摘要

Intuitively, global video context could benefit video semantic segmentation (VSS) if it is designed to simultaneously model global temporal and spatial dependencies for a holistic understanding of the semantic scenes in a video clip. However, we found that the existing VSS approaches focus only on modeling local video context. This paper attempts to bridge this gap by learning global video context for VSS. Apart from the global nature, the video context should also be compact when considering the large number of video feature tokens and the redundancy among nearby video frames. Then, we embed the learned global and compact video context into the features of the target video frame to improve the distinguishability. The proposed VSS method is dubbed Global and Compact Video Context Embedding (GCVCE).With the compact nature, the number of global context tokens is very limited so that GCVCE is flexible and efficient for VSS. Since it may be too challenging to directly abstract a large number of video feature tokens into a small number of global context tokens, we further design a Cascaded Convolutional Downsampling (CCD) module before GCVCE to help it work better. 1.6% improvement in mIoU on the popular VSPW dataset compared to previous state-of-the-art methods demonstrate the effectiveness and efficiency of GCVCE and CCD for VSS. Code and models will be made publicly available.

查看译文

关键词

Video semantic segmentation,global video context,compact video context,video context embedding

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要