Learned by deep neural network

Yue nan Li, Xue piao Chen

semanticscholar(2017)

引用 0|浏览0
暂无评分
摘要
In this paper, we propose to extract robust video descriptor by training deep neural network to automatically capture the intrinsic visual characteristics of digital video. More specifically, we first train a conditional generative model to capture the spatio-temporal correlations among visual contents and represent them as an intermediate descriptor. A nonlinear encoder, with the functions of dimension reduction and error correcting, is then trained to learn a compressed yet more robust representation of the intermediate descriptor. The cascade of the conditional generative model and the encoder constitutes the building block of the deep network for learning video descriptor. As a post-processing component, the top layers of the network are trained to optimize the robustness and discriminative capability of the output descriptor. Experimental results on benchmark databases confirm that the descriptor learned by deep neural network shows excellent robustness against photometric, geometric, temporal and combined distortions, and it can attain an F1 score of 0.982 in content identification, which is much higher than handengineered descriptors.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要