FusionSeg: Learning to combine motion and appearance for fully automatic segmention of generic objects in videos

Suyog Dutt Jain,Bo Xiong,Kristen Grauman

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)（2017）

引用 218|浏览129

暂无评分

摘要

We propose an end-to-end learning framework for segmenting generic objects in videos. Our method learns to combine appearance and motion information to produce pixel level segmentation masks for all prominent objects in videos. We formulate this task as a structured prediction problem and design a two-stream fully convolutional neural network which fuses together motion and appearance in a unified framework. Since large-scale video datasets with pixel level segmentations are problematic, we show how to bootstrap weakly annotated videos together with existing image recognition datasets for training. Through experiments on three challenging video segmentation benchmarks, our method substantially improves the state-of-the-art for segmenting generic (unseen) objects. Code and pre-trained models are available on the project website.

查看译文

关键词

FusionSeg,appearance,fully automatic segmentation,end-to-end learning framework,motion information,pixel level segmentation masks,prominent objects,structured prediction problem,unified framework,large-scale video datasets,pixel level segmentations,generic object segmentation,two-stream fully convolutional neural network design,video segmentation benchmarks,image recognition datasets,weakly annotated videos

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要