Disentangling Motion, Foreground and Background Features in Videos.

Xunyu Lin,Victor Campos,Xavier Giró i Nieto,Jordi Torres,Cristian Canton-Ferrer

arXiv: Computer Vision and Pattern Recognition（2017）

引用 23|浏览21

暂无评分

摘要

This paper instroduces an unsupervised framework toextract semantically rich features for video representation.Inspired by how the human visual system groups objectsbased on motion cues, we propose a deep convolutionalneural network that disentangles motion, foreground andbackground information. The proposed architecture consistsof a 3D convolutional feature encoder for blocks of 16frames, which is trained for reconstruction tasks over thefirst and last frames of the sequence. The model is trainedwith a fraction of videos from the UCF-101 dataset taking asground truth the bounding boxes around the activity regions.Qualitative results indicate that the network can successfullyupdate the foreground appearance based on pure-motionfeatures. The benefits of these learned features are shownin a discriminative classification task when compared witha random initialization of the network weights, providing again of accuracy above the 10%.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要